Two-Phase Split Computing Framework in Edge-Cloud Continuum

被引：1

作者：

Ko, Haneul ^{[1
]}

Kim, Bokyeong ^{[1
]}

Kim, Yumi ^{[1
]}

Pack, Sangheon ^{[2
]}

机构：

[1] Kyung Hee Univ, Dept Elect & Informat Convergence Engn, Yongin 17104, Gyeonggi, South Korea

[2] Korea Univ, Sch Elect Engn, Seoul 02841, South Korea

来源：

IEEE INTERNET OF THINGS JOURNAL | 2024年 / 11卷 / 12期

基金：

新加坡国家研究基金会;

关键词：

Cloud computing; Computational modeling; Mobile handsets; Internet of Things; Artificial neural networks; Performance evaluation; Optimization; Deep neural network (DNN); inference latency; interlayer splitting; intralayer splitting; two-phase split computing; ASSISTED FULL-DUPLEX; MASSIVE-MIMO; SPECTRAL EFFICIENCY; NETWORK; THROUGHPUT; PILOT;

D O I：

10.1109/JIOT.2024.3376977

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Split computing is a promising approach to reduce the inference latency of deep neural network (DNN) models. In this article, we propose a two-phase split computing framework (TSCF). In TSCF, for vertical interlayer splitting between the computing nodes at different levels (e.g., central and edge clouds), a shortest path problem in a directed graph is formulated and a pruning-based low-complexity solution is devised. In addition, for horizontal intralayer splitting between the computing nodes at the same level (e.g., edge clouds), the execution units of a specific layer are further divided and distributed to the computing nodes at the same level proportionally to their available resources. The evaluation results demonstrate that TSCF can reduce inference latency more than 38.8% compared to the traditional interlayer splitting scheme by efficiently using the resources of distributed computing nodes. In addition, it is demonstrated that near-optimal performance in terms of inference latency can be achieved even with a pruning-based low-complexity solution.

引用

页码：21741 / 21749

页数：9

共 28 条

[1] Bakhtiarnia A., 2023, P ICASSP, P1
[2] Sirius: A Flat Datacenter Network with Nanosecond Optical Switching
Ballani, Hitesh
Costa, Paolo
Behrendt, Raphael
Cletheroe, Daniel
Haller, Istvan
Jozwik, Krzysztof
Karinou, Fotini
Lange, Sophie
Shi, Kai
Thomsen, Benn
Williams, Hugh
[J]. SIGCOMM '20: PROCEEDINGS OF THE 2020 ANNUAL CONFERENCE OF THE ACM SPECIAL INTEREST GROUP ON DATA COMMUNICATION ON THE APPLICATIONS, TECHNOLOGIES, ARCHITECTURES, AND PROTOCOLS FOR COMPUTER COMMUNICATION, 2020, : 782 - 797
[3] Optimal Task Allocation for Time-Varying Edge Computing Systems with Split DNNs
Callegaro, Davide
Matsubara, Yoshitomo
Levorato, Marco
[J]. 2020 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2020,
[4] Case J., 1990, A Simple Network Management Protocol (SNMP)
[5] Efficient Multi-User Computation Offloading for Mobile-Edge Cloud Computing
Chen, Xu
Jiao, Lei
Li, Wenzhong
Fu, Xiaoming
[J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2016, 24 (05) : 2827 - 2840
[6] Datta P, 2022, Arxiv, DOI arXiv:2208.11596
[7] Dong Z., 2022, PROC EUR WIRELES, P1
[8] flozz.github, 2024, PyPAPI
[9] Distributed Inference with Deep Learning Models across Heterogeneous Edge Devices
Hu, Chenghao
Li, Baochun
[J]. IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2022), 2022, : 330 - 339
[10] Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
Kang, Yiping
Hauswald, Johann
Gao, Cao
Rovinski, Austin
Mudge, Trevor
Mars, Jason
Tang, Lingjia
[J]. ACM SIGPLAN NOTICES, 2017, 52 (04) : 615 - 629

← 1 2 3 →