Distributed Assignment With Load Balancing for DNN Inference at the Edge

被引:15
作者
Xu, Yuzhe [1 ]
Mohammed, Thaha [2 ]
Di Francesco, Mario [2 ]
Fischione, Carlo [3 ]
机构
[1] KTH Royal Inst Technol, Stockholm 11428, Sweden
[2] Aalto Univ, Dept Comp Sci, Espoo 02150, Finland
[3] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Stockholm 11428, Sweden
基金
芬兰科学院;
关键词
Servers; Task analysis; Computational modeling; Internet of Things; Training; Edge computing; Computer architecture; Assignment problems; distributed inference; deep neural network (DNN) offloading; edge computing; INTELLIGENCE; ASSOCIATION; NETWORKS; INTERNET; THINGS; IOT;
D O I
10.1109/JIOT.2022.3205410
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inference carried out on pretrained deep neural networks (DNNs) is particularly effective as it does not require retraining and entails no loss in accuracy. Unfortunately, resource-constrained devices such as those in the Internet of Things may need to offload the related computation to more powerful servers, particularly, at the network edge. However, edge servers have limited resources compared to those in the cloud; therefore, inference offloading generally requires dividing the original DNN into different pieces that are then assigned to multiple edge servers. Related approaches in the state-of-the-art either make strong assumptions on the system model or fail to provide strict performance guarantees. This article specifically addresses these limitations by applying distributed assignment to DNN inference at the edge. In particular, it devises a detailed model of DNN-based inference, suitable for realistic scenarios involving edge computing. Optimal inference offloading with load balancing is also defined as a multiple assignment problem that maximizes proportional fairness. Moreover, a distributed algorithm for DNN inference offloading is introduced to solve such a problem in polynomial time with strong optimality guarantees. Finally, extensive simulations employing different data sets and DNN architectures establish that the proposed solution significantly improves upon the state-of-the-art in terms of inference time (1.14 to 2.62 times faster), load balance (with Jain's fairness index of 0.9), and convergence (one order of magnitude less iterations).
引用
收藏
页码:1053 / 1065
页数:13
相关论文
共 56 条
[1]  
[Anonymous], 1998, Network Optimization: Continuous and Discrete Models
[2]  
Asad M, 2020, Arxiv, DOI arXiv:2004.02738
[3]  
Boyd S., 2004, CONVEX OPTIMIZATION
[4]  
Castellano G, 2019, IEEE INFOCOM SER, P2548, DOI [10.1109/INFOCOM.2019.8737532, 10.1109/infocom.2019.8737532]
[5]   Theoretical peak FLOPS per instruction set: a tutorial [J].
Dolbeau, Romain .
JOURNAL OF SUPERCOMPUTING, 2018, 74 (03) :1341-1377
[6]   Proactive edge computing in fog networks with latency and reliability guarantees [J].
Elbamby, Mohammed S. ;
Bennis, Mehdi ;
Saad, Walid ;
Latva-Aho, Matti ;
Hong, Choong Seon .
EURASIP JOURNAL ON WIRELESS COMMUNICATIONS AND NETWORKING, 2018,
[7]  
Feng Xue, 2020, 2020 IEEE 22nd International Conference on High Performance Computing and Communications
[8]  
IEEE 18th International Conference on Smart City
[9]  
IEEE 6th International Conference on Data Science and Systems (HPCC/SmartCity/DSS), P613, DOI 10.1109/HPCC-SmartCity-DSS50907.2020.00078
[10]  
Gholami Amir, 2022, Low-Power Computer Vision, P291, DOI DOI 10.1201/9781003162810-13