Multi-Exit DNN Inference Acceleration Based on Multi-Dimensional Optimization for Edge Intelligence

被引：39

作者：

Dong, Fang ^{[1
]}

Wang, Huitian ^{[1
]}

Shen, Dian ^{[1
]}

Huang, Zhaowu ^{[1
]}

He, Qiang ^{[2
]}

Zhang, Jinghui ^{[1
]}

Wen, Liangsheng ^{[3
]}

Zhang, Tingting ^{[3
]}

机构：

[1] Southeast Univ, Sch Comp Sci & Engn, Nanjing 211189, Peoples R China

[2] Swinburne Univ Technol, Dept Comp Technol, Melbourne 3122, Australia

[3] China Mobile Res Inst, Beijing 100000, Peoples R China

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2023年 / 22卷 / 09期

基金：

中国国家自然科学基金;

关键词：

Edge intelligence; exit selection; inference acceleration; model partition; multi-exit DNN; resource allocation;

D O I：

10.1109/TMC.2022.3172402

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Edge intelligence, as a prospective paradigm for accelerating DNN inference, is mostly implemented by model partitioning which inevitably incurs the large transmission overhead of DNN's intermediate data. A popular solution introduces multi-exit DNNs to reduce latency by enabling early exits. However, existing work ignores the correlation between exit settings and synergistic inference, causing incoordination of device-to-edge. To address this issue, this paper first investigates the bottlenecks of executing multi-exit DNNs in edge computing and builds a novel model for inference acceleration with exit selection, model partition, and resource allocation. To tackle the intractable coupling subproblems, we propose a Multi-exit DNN inference Acceleration framework based on Multi-dimensional Optimization (MAMO). In MAMO, the exit selection subproblem is first extracted from the original problem. Then, bidirectional dynamic programming is employed to determine the optimal exit setting for an arbitrary multi-exit DNN. Finally, based on the optimal exit setting, a DRL-based policy is developed to learn joint decisions of model partition and resource allocation. We deploy MAMO on a real-world testbed and evaluate its performance in various scenarios. Extensive experiments show that it can adapt to heterogeneous tasks and dynamic networks, and accelerate DNN inference by up to 13:7x compared with the state-of-the-art.

引用

页码：5389 / 5405

页数：17

共 51 条

[1]

Abadi M., 2015, TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems

[2]

[Anonymous], 2018, TENS

[3]

[Anonymous], 2014, COMC

[4]

Bai S., 2015, CAFFE ANDROID LIB

[5]

Bolukbasi T, 2017, PR MACH LEARN RES, V70

[6] Mobile Augmented Reality Survey: From Where We Are to Where We Go [J].

Chatzopoulos, Dimitris ;

Bermejo, Carlos ;

Huang, Zhanpeng ;

Hui, Pan .

IEEE ACCESS, 2017, 5 :6917-6950

[7] Deep Learning With Edge Computing: A Review [J].

Chen, Jiasi ;

Ran, Xukan .

PROCEEDINGS OF THE IEEE, 2019, 107 (08) :1655-1674

[8] Rethinking Pruning for Accelerating Deep Inference At the Edge [J].

Gao, Dawei ;

He, Xiaoxi ;

Zhou, Zimu ;

Tong, Yongxin ;

Xu, Ke ;

Thiele, Lothar .

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :155-164

[9]

Gulli Antonio, 2004, AG's corpus of news articles

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 5 6 →