Power- and Time-Aware Deep Learning Inference for Mobile Embedded Devices

被引:7
作者
Kang, Woochul [1 ]
Chung, Jaeyong [2 ]
机构
[1] Incheon Natl Univ, Dept Embedded Syst Engn, Incheon 22012, South Korea
[2] Incheon Natl Univ, Dept Elect Engn, Incheon 22012, South Korea
基金
新加坡国家研究基金会;
关键词
Deep learning; DVFS; feedback control; embedded systems; low power; power-awareness; Quality-of-Service; QoS; real-time;
D O I
10.1109/ACCESS.2018.2887099
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep learning is a state-of-the-art approach that provides highly accurate inference for many cyber-physical systems (CPS) such as autonomous cars and robots. Deep learning inference often needs to be performed locally on mobile and embedded devices, rather than in the cloud, to address concerns such as latency, power consumption, and limited bandwidth. However, existing approaches have focused on delivering "best-effort" performance to resource-constrained mobile embedded devices, resulting in unpredictable performance under highly variable environments of CPS. In this paper, we propose a novel deep learning inference runtime, called DeepRT, that supports multiple QoS objectives simultaneously against unpredictable workloads. In DeepRT, the multiple inputs/multiple outputs (MIMO) modeling and control methodology is proposed as a primary tool to support multiple QoS goals including the inference latency and power consumption. DeepRT's MIMO controller coordinates multiple computing resources, such as CPUs and GPUs, by capturing their close interactions and effects on multiple QoS objectives. We demonstrate the viability of DeepRT's QoS management architecture by implementing a prototype of DeepRT. The evaluation results demonstrate that, compared with baseline approaches, DeepRT can support the desired inference latency as well as power consumption for various deep learning models in a highly robust manner.
引用
收藏
页码:3778 / 3789
页数:12
相关论文
共 40 条
  • [1] Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
  • [2] Power and Performance Characterization and Modeling of GPU-Accelerated Systems
    Abe, Yuki
    Inoue, Koji
    Sasaki, Hiroshi
    Edahiro, Masato
    Kato, Shinpei
    Peres, Martin
    [J]. 2014 IEEE 28TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, 2014,
  • [3] GPU Scheduling on the NVIDIA TX2: Hidden Details Revealed
    Amert, Tanya
    Otterness, Nathan
    Yang, Ming
    Anderson, James H.
    Smith, F. Donelson
    [J]. 2017 IEEE REAL-TIME SYSTEMS SYMPOSIUM (RTSS), 2017, : 104 - 115
  • [4] [Anonymous], 2014, P BRIT MACH VIS C
  • [5] [Anonymous], 2014, COMPRESSING DEEP CON
  • [6] [Anonymous], 2015, CORR
  • [7] [Anonymous], NVIDIA TENSORRT
  • [8] [Anonymous], 2015, PROC CVPR IEEE
  • [9] [Anonymous], 2016, Proceedings of the 15th International Conference on Information Processing in Sensor Networks
  • [10] [Anonymous], INT J MOBILE LEARN O