HIDL: High-Throughput Deep Learning Inference at the Hybrid Mobile Edge

被引:31
作者
Wu, Jing [1 ]
Wang, Lin [2 ,3 ]
Pei, Qiangyu [1 ]
Cui, Xingqi [1 ]
Liu, Fangming [1 ]
Yang, Tingting [4 ]
机构
[1] Huazhong Univ Sci & Technol, Natl Engn Res Ctr Big Data Technol & Syst, Serv Comp Technol & Syst Lab, Cluster & Grid Comp Lab,Sch Comp Sci & Technol, Wuhan 430074, Peoples R China
[2] Vrije Univ Amsterdam, NL-1081 HV Amsterdam, Netherlands
[3] Tech Univ Darmstadt, D-64289 Darmstadt, Germany
[4] Peng Cheng Lab, Shenzhen 518066, Peoples R China
关键词
Deep learning inference; edge computing; resource allocation; systems for machine learning;
D O I
10.1109/TPDS.2022.3195664
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Deep neural networks (DNNs) have become a critical component for inference in modem mobile applications, but the efficient provisioning of DNNs is non-trivial. Existing mobile- and server-based approaches compromise either the inference accuracy or latency. Instead, a hybrid approach can reap the benefits of the two by splitting the DNN at an appropriate layer and running the two parts separately on the mobile and the server respectively. Nevertheless, the DNN throughput in the hybrid approach has not been carefully examined, which is particularly important for edge servers where limited compute resources are shared among multiple DNNs. This article presents HiTDL, a runtime framework for managing multiple DNNs provisioned following the hybrid approach at the edge. HiTDL's mission is to improve edge resource efficiency by optimizing the combined throughput of all co-located DNNs, while still guaranteeing their SLAB. To this end, HiTDL first builds comprehensive performance models for DNN inference latency and throughout with respect to multiple factors including resource availability, DNN partition plan, and cross-DNN interference. HiTDL then uses these models to generate a set of candidate partition plans with SLA guarantees for each DNN. Finally, HiTDL makes global throughput-optimal resource allocation decisions by selecting partition plans from the candidate set for each DNN via solving a fairness-aware multiple-choice knapsack problem. Experimental results based on a prototype implementation show that HiTDL improves the overall throughput of the edge by 4.3x compared with the state-of-the-art.
引用
收藏
页码:4499 / 4514
页数:16
相关论文
共 57 条
  • [1] [Anonymous], 2020, US
  • [2] [Anonymous], 2020, ABOUTUS
  • [3] Auto-Split: A General Framework of Collaborative Edge-Cloud AI
    Banitalebi-Dehkordi, Amin
    Vedula, Naveen
    Pei, Jian
    Xia, Fei
    Wang, Lanjun
    Zhang, Yong
    [J]. KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 2543 - 2553
  • [4] On-Edge Multi-Task Transfer Learning: Model and Practice With Data-Driven Task Allocation
    Chen, Qiong
    Zheng, Zimu
    Hu, Chuang
    Wang, Dan
    Liu, Fangming
    [J]. IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2020, 31 (06) : 1357 - 1371
  • [5] Data-driven Task Allocation for Multi-task Transfer Learning on the Edge
    Chen, Qiong
    Zheng, Zimu
    Hu, Chuang
    Wang, Dan
    Liu, Fangming
    [J]. 2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, : 1040 - 1050
  • [6] Cloud-DNN: An Open Framework for Mapping DNN Models to Cloud FPGAs
    Chen, Yao
    He, Jiong
    Zhang, Xiaofan
    Hao, Cong
    Chen, Deming
    [J]. PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, : 73 - 82
  • [7] Cotta C., 1994, Artificial Neural Nets and Genetic Algorithm, V3, P250
  • [8] Crankshaw Daniel, 2020, SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing, P477, DOI 10.1145/3419111.3421285
  • [9] Crankshaw D, 2017, PROCEEDINGS OF NSDI '17: 14TH USENIX SYMPOSIUM ON NETWORKED SYSTEMS DESIGN AND IMPLEMENTATION, P613
  • [10] Dhakal Aditya, 2020, SoCC '20: Proceedings of the 11th ACM Symposium on Cloud Computing, P492, DOI 10.1145/3419111.3421284