Towards Real-Time Inference Offloading With Distributed Edge Computing: The Framework and Algorithms

被引:8
作者
Chen, Quan [1 ]
Guo, Song [2 ]
Wang, Kaijia [1 ]
Xu, Wenchao [3 ]
Li, Jing [3 ]
Cai, Zhipeng [4 ]
Gao, Hong [5 ]
Zomaya, Albert Y. [6 ]
机构
[1] Guangdong Univ Technol, Sch Comp Sci & Technol, Guangzhou 510006, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong 999077, Peoples R China
[3] Hong Kong Polytech Univ, Dept Comp, Hong Kong 999077, Peoples R China
[4] Georgia State Univ, Dept Comp Sci, Atlanta, GA 30303 USA
[5] Zhejiang Normal Univ, Sch Comp Sci & Technol, Jinhua 321017, Peoples R China
[6] Univ Sydney, Sch Comp Sci, Sydney, NSW 2050, Australia
关键词
Distributed edge computing; edge inference; parallel computing; task assignment; SERVICE PLACEMENT; COMPUTATION;
D O I
10.1109/TMC.2023.3335051
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
By combining edge computing and parallel computing, distributed edge computing has emerged as a new paradigm to exploit the booming IoT devices at the edge. To accelerate computation at the edge, i.e., the inference tasks for DNN-driven applications, the parallelism of both computation and communication needs to be considered for distributed edge computing, and thus, the problem of Minimum Latency joint Communication and Computation Scheduling (MLCCS) is proposed. However, existing works have rigid assumptions that the communication time of each device is fixed and the workload can be split arbitrarily small. Aiming at making the work more practical and general, the MLCCS problem without the above assumptions is studied in this paper. First, the MLCCS problem under a general model is formulated and proved to be NP-hard. Second, a pyramid-based computing model is proposed to consider the parallelism of communication and computation jointly, which has an approximation ratio of $1+\delta$1+delta, where $\delta$delta is related to devices' communication rates. An interesting property under such a computing model is identified and proved, i.e., the optimal latency can be obtained under arbitrary scheduling order when all the devices share the same communication rate. When the workload cannot be split arbitrarily, an approximation algorithm with a ratio of at most $2\cdot (1+\delta)$2<middle dot>(1+delta) is proposed. Additionally, for handling the dynamically changing network scenarios, several algorithms are also proposed accordingly. Finally, the theoretical analysis and simulation results verify that the proposed algorithm has high performance in terms of latency. Two testbed experiments are also conducted, which show that the proposed method outperforms the existing methods, reducing the latency by up to 29.2% for inference tasks at the edge.
引用
收藏
页码:7552 / 7571
页数:20
相关论文
共 44 条
[1]  
3gpp, 2022, 3GPP specifications
[2]  
[Anonymous], 2021, IEEE J. Sel. Areas Commun., V39, P3688
[3]  
Bluetooth, 2022, About us
[4]   Latency-and-Coverage Aware Data Aggregation Scheduling for Multihop Battery-Free Wireless Networks [J].
Cai, Zhipeng ;
Chen, Quan .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (03) :1770-1784
[5]  
Chang H, 2011, IEEE INFOCOM SER, P3074, DOI 10.1109/INFCOM.2011.5935152
[6]  
Chen FF, 2012, IEEE INFOCOM SER, P1143, DOI 10.1109/INFCOM.2012.6195473
[7]   Computation Peer Offloading for Energy-Constrained Mobile Edge Computing in Small-Cell Networks [J].
Chen, Lixing ;
Zhou, Sheng ;
Xu, Jie .
IEEE-ACM TRANSACTIONS ON NETWORKING, 2018, 26 (04) :1619-1632
[8]   Task Offloading for Mobile Edge Computing in Software Defined Ultra-Dense Network [J].
Chen, Min ;
Hao, Yixue .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2018, 36 (03) :587-597
[9]  
Chen Q., 2023, P IEEE C COMP COMM, P1
[10]  
Chen Q., 2017, P IEEE INFOCOM C COM, P1