DMS: Dynamic Model Scaling for Quality-Aware Deep Learning Inference in Mobile and Embedded Devices

被引:13
作者
Kang, Woochul [1 ]
Kim, Daeyeon [1 ]
Park, Junyoung [1 ]
机构
[1] Incheon Natl Univ, Dept Embedded Syst Engn, Incheon 22012, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; edge devices; embedded systems; energy efficiency; feedback control; filter pruning; mobile devices; model compression; quality-of-service; QoS;
D O I
10.1109/ACCESS.2019.2954546
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, deep learning has brought revolutions to many mobile and embedded systems that interact with the physical world using continuous video streams. Although there have been significant efforts to reduce the computational overheads of deep learning inference in such systems, previous approaches have focused on delivering 'best-effort' performance, resulting in unpredictable performance under variable environments. In this paper, we propose a runtime control method, called DMS (Dynamic Model Scaling), that enables dynamic resource-accuracy trade-offs to support various QoS requirements of deep learning applications. In DMS, the resource demands of deep learning inference can be controlled by adaptive pruning of computation-intensive convolution filters. DMS avoids irregularity of pruned models by reorganizing filters according to their importance so that varying number of filters can be applied efficiently. Since DMS's pruning method incurs no runtime overhead and preserves the full capacity of original deep learning models, DMS can tailor the models at runtime for concurrent deep learning applications with their respective resource-accuracy trade-offs. We demonstrate the viability of DMS by implementing a prototype. The evaluation results demonstrate that, if properly coordinated with system level resource managers, DMS can support highly robust and efficient inference performance against unpredictable workloads.
引用
收藏
页码:168048 / 168059
页数:12
相关论文
共 48 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]  
[Anonymous], 2018, P EUR EUR C COMP VIS
[3]  
[Anonymous], 2019, NVIDIA TensorRT
[4]  
[Anonymous], 2014, SYNTHETIC DATA ARTIF
[5]  
[Anonymous], 2018, PROC IEEE INT C COMM
[6]  
[Anonymous], ACM J EMERG TECH COM
[7]  
[Anonymous], 2017, ARXIV170404861
[8]  
[Anonymous], 2016, DEEP LEARNING
[9]  
[Anonymous], 2016, ARXIV160207360
[10]  
[Anonymous], 2015, PROC CVPR IEEE