Accelerating DNN Inference by Edge-Cloud Collaboration

被引:2
作者
Chen, Jianan [1 ]
Qi, Qi [1 ]
Wang, Jingyu [1 ]
Sun, Haifeng [1 ]
Liao, Jianxin [1 ]
机构
[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing, Peoples R China
来源
2021 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE (IPCCC) | 2021年
基金
中国博士后科学基金; 中国国家自然科学基金; 国家重点研发计划;
关键词
DNN inference; edge devices; dynamic partition; intelligent application;
D O I
10.1109/IPCCC51483.2021.9679434
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Deep neural networks (DNN) have become indispensable tools for intelligent applications today. The demand for deploying DNN on the edge devices increases dramatically. Unfortunately, it is challenging because the DNN inference is computation-intensive, but edge devices are always resourceconstraint. Prior solutions attempted to address these challenges with collaboration between cloud and edge devices, but they do not take the inference request rate into account. However, the inference delay will increase dramatically while the request rate becomes higher. In this paper, we propose a scheme to dynamic partition DNN into two or three parts and distribute them at the edge and cloud, achieving the lowest delay with the change of request rate. The scheme selects the optimal partition points of DNN with a layer evaluation model (LEM) and a total delay prediction model (DPM) under different request rates. The experiments of distributed deploying AlexNet, VGG, NiN and ResNet DNN models on image classification dataset ImageNet show that the proposed scheme significantly reduces the total end-to-end latency by fully using both the edge and cloud resources. It reduces the inference delay by 1.3 to 1.6 times and improves the throughput 1.2 to 1.7 times compared to the state of art partition approach.
引用
收藏
页数:7
相关论文
共 25 条
[1]   Survey of main challenges (security and privacy) in wireless body area networks for healthcare applications [J].
Al-Janabi, Samaher ;
Al-Shourbaji, Ibrahim ;
Shojafar, Mohammad ;
Shamshirband, Shahaboddin .
EGYPTIAN INFORMATICS JOURNAL, 2017, 18 (02) :113-122
[2]   Reconstruction error based deep neural networks for coronary heart disease risk prediction [J].
Amarbayasgalan, Tsatsral ;
Park, Kwang Ho ;
Lee, Jong Yun ;
Ryu, Keun Ho .
PLOS ONE, 2019, 14 (12)
[3]   Smart Factory of Industry 4.0: Key Technologies, Application Case, and Challenges [J].
Chen, Baotong ;
Wan, Jiafu ;
Shu, Lei ;
Li, Peng ;
Mukherjee, Mithun ;
Yin, Boxin .
IEEE ACCESS, 2018, 6 :6505-6519
[4]   Millimeter-Wave Vehicular Communication to Support Massive Automotive Sensing [J].
Choi, Junil ;
Va, Vutha ;
Gonzalez-Prelcic, Nuria ;
Daniels, Robert ;
Bhat, Chandra R. ;
Heath, Robert W., Jr. .
IEEE COMMUNICATIONS MAGAZINE, 2016, 54 (12) :160-167
[5]   DozzNoC: Reducing Static and Dynamic Energy in NoCs with Low-latency Voltage Regulators using Machine Learning [J].
Clark, Mark ;
Chen, Yingping ;
Karanth, Avinash ;
Ma, Brian ;
Louri, Ahmed .
2020 IEEE 34TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM IPDPS 2020, 2020, :1-11
[6]   Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection [J].
Du, Xianzhi ;
El-Khamy, Mostafa ;
Lee, Jungwon ;
Davis, Larry .
2017 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2017), 2017, :953-961
[7]   Deep learning for time series classification: a review [J].
Fawaz, Hassan Ismail ;
Forestier, Germain ;
Weber, Jonathan ;
Idoumghar, Lhassane ;
Muller, Pierre-Alain .
DATA MINING AND KNOWLEDGE DISCOVERY, 2019, 33 (04) :917-963
[8]  
Hu C, 2019, IEEE INFOCOM SER, P1423, DOI [10.1109/infocom.2019.8737614, 10.1109/INFOCOM.2019.8737614]
[9]   A proactive task dispatching method based on future bottleneck prediction for the smart factory [J].
Huang, Binbin ;
Wang, Wenbo ;
Ren, Shan ;
Zhong, Ray Y. ;
Jiang, Jingchao .
INTERNATIONAL JOURNAL OF COMPUTER INTEGRATED MANUFACTURING, 2019, 32 (03) :278-293
[10]   A risk analysis of a smart home automation system [J].
Jacobsson, Andreas ;
Boldt, Martin ;
Carlsson, Bengt .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2016, 56 :719-733