Power- and Time-Aware Deep Learning Inference for Mobile Embedded Devices

被引:7
作者
Kang, Woochul [1 ]
Chung, Jaeyong [2 ]
机构
[1] Incheon Natl Univ, Dept Embedded Syst Engn, Incheon 22012, South Korea
[2] Incheon Natl Univ, Dept Elect Engn, Incheon 22012, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; DVFS; feedback control; embedded systems; low power; power-awareness; Quality-of-Service; QoS; real-time;
D O I
10.1109/ACCESS.2018.2887099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is a state-of-the-art approach that provides highly accurate inference for many cyber-physical systems (CPS) such as autonomous cars and robots. Deep learning inference often needs to be performed locally on mobile and embedded devices, rather than in the cloud, to address concerns such as latency, power consumption, and limited bandwidth. However, existing approaches have focused on delivering "best-effort" performance to resource-constrained mobile embedded devices, resulting in unpredictable performance under highly variable environments of CPS. In this paper, we propose a novel deep learning inference runtime, called DeepRT, that supports multiple QoS objectives simultaneously against unpredictable workloads. In DeepRT, the multiple inputs/multiple outputs (MIMO) modeling and control methodology is proposed as a primary tool to support multiple QoS goals including the inference latency and power consumption. DeepRT's MIMO controller coordinates multiple computing resources, such as CPUs and GPUs, by capturing their close interactions and effects on multiple QoS objectives. We demonstrate the viability of DeepRT's QoS management architecture by implementing a prototype of DeepRT. The evaluation results demonstrate that, compared with baseline approaches, DeepRT can support the desired inference latency as well as power consumption for various deep learning models in a highly robust manner.
引用
收藏
页码:3778 / 3789
页数:12
相关论文
共 40 条
  • [21] EIE: Efficient Inference Engine on Compressed Deep Neural Network
    Han, Song
    Liu, Xingyu
    Mao, Huizi
    Pu, Jing
    Pedram, Ardavan
    Horowitz, Mark A.
    Dally, William J.
    [J]. 2016 ACM/IEEE 43RD ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA), 2016, : 243 - 254
  • [22] JouleGuard: Energy Guarantees for Approximate Applications
    Hoffmann, Henry
    [J]. SOSP'15: PROCEEDINGS OF THE TWENTY-FIFTH ACM SYMPOSIUM ON OPERATING SYSTEMS PRINCIPLES, 2015, : 198 - 214
  • [23] Iandola F. N., Squeezenet: Alexnet-level accuracy with 50x fewer parameters and < 0.5 mb model size
  • [24] In-Datacenter Performance Analysis of a Tensor Processing Unit
    Jouppi, Norman P.
    Young, Cliff
    Patil, Nishant
    Patterson, David
    Agrawal, Gaurav
    Bajwa, Raminder
    Bates, Sarah
    Bhatia, Suresh
    Boden, Nan
    Borchers, Al
    Boyle, Rick
    Cantin, Pierre-luc
    Chao, Clifford
    Clark, Chris
    Coriell, Jeremy
    Daley, Mike
    Dau, Matt
    Dean, Jeffrey
    Gelb, Ben
    Ghaemmaghami, Tara Vazir
    Gottipati, Rajendra
    Gulland, William
    Hagmann, Robert
    Ho, C. Richard
    Hogberg, Doug
    Hu, John
    Hundt, Robert
    Hurt, Dan
    Ibarz, Julian
    Jaffey, Aaron
    Jaworski, Alek
    Kaplan, Alexander
    Khaitan, Harshit
    Killebrew, Daniel
    Koch, Andy
    Kumar, Naveen
    Lacy, Steve
    Laudon, James
    Law, James
    Le, Diemthu
    Leary, Chris
    Liu, Zhuyuan
    Lucke, Kyle
    Lundin, Alan
    MacKean, Gordon
    Maggiore, Adriana
    Mahony, Maire
    Miller, Kieran
    Nagarajan, Rahul
    Narayanaswami, Ravi
    [J]. 44TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2017), 2017, : 1 - 12
  • [25] DeepRT: predictable deep learning inference for cyber-physical systems
    Kang, Woochul
    Chung, Jaeyong
    [J]. REAL-TIME SYSTEMS, 2019, 55 (01) : 106 - 135
  • [26] Energy-efficient response time management for embedded databases
    Kang, Woochul
    Chung, Jaeyong
    [J]. REAL-TIME SYSTEMS, 2017, 53 (02) : 228 - 253
  • [27] Racing and Pacing to Idle: Theoretical and Empirical Analysis of Energy Optimization Heuristics
    Kim, David H. K.
    Imes, Connor
    Hoffmann, Henry
    [J]. 2015 IEEE 3RD INTERNATIONAL CONFERENCE ON CYBER-PHYSICAL SYSTEMS, NETWORKS, AND APPLICATIONS CPSNA 2015, 2015, : 78 - 85
  • [28] ImageNet Classification with Deep Convolutional Neural Networks
    Krizhevsky, Alex
    Sutskever, Ilya
    Hinton, Geoffrey E.
    [J]. COMMUNICATIONS OF THE ACM, 2017, 60 (06) : 84 - 90
  • [29] Gradient-based learning applied to document recognition
    Lecun, Y
    Bottou, L
    Bengio, Y
    Haffner, P
    [J]. PROCEEDINGS OF THE IEEE, 1998, 86 (11) : 2278 - 2324
  • [30] SCHEDULING ALGORITHMS FOR MULTIPROGRAMMING IN A HARD-REAL-TIME ENVIRONMENT
    LIU, CL
    LAYLAND, JW
    [J]. JOURNAL OF THE ACM, 1973, 20 (01) : 46 - 61