HIDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

被引：31

作者：

Wu, Jing ^{[1
]}

Wang, Lin ^{[2
,3
]}

Pei, Qiangyu ^{[1
]}

Cui, Xingqi ^{[1
]}

Liu, Fangming ^{[1
]}

Yang, Tingting ^{[4
]}

机构：

[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Peoples R China

[2] Vrije Univ Amsterdam, NL-1081 HV Amsterdam, Netherlands

[3] Tech Univ Darmstadt, D-64289 Darmstadt, Germany

[4] Peng Cheng Lab, Shenzhen 518066, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2022年 / 33卷 / 12期

关键词：

Deep learning inference; edge computing; resource allocation; systems for machine learning;

D O I：

10.1109/TPDS.2022.3195664

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Deep neural networks (DNNs) have become a critical component for inference in modem mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile- and server-based approaches compromise either the inference accuracy or latency. Instead, a hybrid approach can reap the benefits of the two by splitting the DNN at an appropriate layer and running the two parts separately on the mobile and the server respectively. Nevertheless, the DNN throughput in the hybrid approach has not been carefully examined, which is particularly important for edge servers where limited compute resources are shared among multiple DNNs. This article presents HiTDL, a runtime framework for managing multiple DNNs provisioned following the hybrid approach at the edge. HiTDL's mission is to improve edge resource efficiency by optimizing the combined throughput of all co-located DNNs, while still guaranteeing their SLAB. To this end, HiTDL first builds comprehensive performance models for DNN inference latency and throughout with respect to multiple factors including resource availability, DNN partition plan, and cross-DNN interference. HiTDL then uses these models to generate a set of candidate partition plans with SLA guarantees for each DNN. Finally, HiTDL makes global throughput-optimal resource allocation decisions by selecting partition plans from the candidate set for each DNN via solving a fairness-aware multiple-choice knapsack problem. Experimental results based on a prototype implementation show that HiTDL improves the overall throughput of the edge by 4.3x compared with the state-of-the-art.

引用

页码：4499 / 4514

页数：16

共 57 条

[1] [Anonymous], 2020, US
[2] [Anonymous], 2020, ABOUTUS
[3] Auto-Split: A General Framework of Collaborative Edge-Cloud AI
Banitalebi-Dehkordi, Amin
Vedula, Naveen
Pei, Jian
Xia, Fei
Wang, Lanjun
Zhang, Yong
[J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2543 - 2553
[4] On-Edge Multi-Task Transfer Learning: Model and Practice With Data-Driven Task Allocation
Chen, Qiong
Zheng, Zimu
Hu, Chuang
Wang, Dan
Liu, Fangming
[J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (06) : 1357 - 1371
[5] Data-driven Task Allocation for Multi-task Transfer Learning on the Edge
Chen, Qiong
Zheng, Zimu
Hu, Chuang
Wang, Dan
Liu, Fangming
[J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 1040 - 1050
[6] Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
Chen, Yao
He, Jiong
Zhang, Xiaofan
Hao, Cong
Chen, Deming
[J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 73 - 82
[7] Cotta C., 1994, Artificial Neural Nets and Genetic Algorithm, V3, P250
[8] Crankshaw Daniel, 2020, SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing, P477, DOI 10.1145/3419111.3421285
[9] Crankshaw D, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P613
[10] Dhakal Aditya, 2020, SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing, P492, DOI 10.1145/3419111.3421284

← 1 2 3 4 5 6 →