Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

被引：0

作者：

Tang, Weiheng ^{[1
]}

Li, Jingyi ^{[1
]}

Chen, Lin ^{[2
]}

Chen, Xu ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China

[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangdong Prov Key Lab Informat Secur Technol, Guangzhou 510275, Peoples R China

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2024年 / 72卷 / 12期

基金：

美国国家科学基金会;

关键词：

Encoding; Distance learning; Computer aided instruction; Computational modeling; Task analysis; Optimization; Computer architecture; Distributed learning; hierarchical architecture; stragglers tolerance; gradient coding; ALLOCATION;

D O I：

10.1109/TCOMM.2024.3418901

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional intermediate layer between the master and workers in the original distributed learning systems, potentially leading to more severe straggler effect. Recently, coding theory-based approaches have been proposed for stragglers mitigation in distributed learning, but the majority focus on the conventional workers-master architecture. In this paper, along a different line, we investigate the problem of mitigating the straggler effect in hierarchical distributed learning systems with an additional layer composed of edge nodes. Technically, we first derive the fundamental trade-off between the computational loads of workers and the stragglers tolerance. Then, we propose a hierarchical gradient coding framework, which provides better stragglers mitigation, to achieve the derived computational trade-off. To further improve the performance of our framework in heterogeneous scenarios, we formulate an optimization problem with the objective of minimizing the expected execution time for each iteration in the learning process. We develop an efficient algorithm to mathematically solve the problem by outputting the optimum strategy. Extensive simulation results demonstrate the superiority of our schemes compared with conventional solutions.

引用

页码：7727 / 7741

页数：15

共 50 条

[1] Joint Coding and Scheduling Optimization for Distributed Learning Over Wireless Edge Networks
Nguyen Van Huynh
Dinh Thai Hoang
Nguyen, Diep N.
Dutkiewicz, Eryk
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (02) : 484 - 498
[2] Hierarchical Broadcast Coding: Expediting Distributed Learning at the Wireless Edge
Han, Dong-Jun
Sohn, Jy-Yong
Moon, Jaekyun
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (04) : 2266 - 2281
[3] Optimization-Based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
Wang, Qi
Cui, Ying
Li, Chenglin
Zou, Junni
Xiong, Hongkai
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 1023 - 1038
[4] Distributed Learning With Sparsified Gradient Differences
Chen, Yicheng
Blum, Rick S.
Takac, Martin
Sadler, Brian M.
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (03) : 585 - 600
[5] Joint Dynamic Grouping and Gradient Coding for Time-Critical Distributed Machine Learning in Heterogeneous Edge Networks
Mao, Yingchi
Wu, Jun
He, Xiaoming
Ping, Ping
Wang, Jiajun
Wu, Jie
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22723 - 22736
[6] Distributed Learning for Automatic Modulation Classification in Edge Devices
Wang, Yu
Guo, Liang
Zhao, Yu
Yang, Jie
Adebisi, Bamidele
Gacanin, Haris
Gui, Guan
IEEE WIRELESS COMMUNICATIONS LETTERS, 2020, 9 (12) : 2177 - 2181
[7] LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
Zhang, Jingjing
Simeone, Osvaldo
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 962 - 974
[8] CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning
Reisizadeh, Amirhossein
Prakash, Saurav
Pedarsani, Ramtin
Avestimehr, Amir Salman
IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (01) : 148 - 161
[9] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
Li, Chengxi
Skoglund, Mikael
IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
[10] Edge-Based Communication Optimization for Distributed Federated Learning
Wang, Tian
Liu, Yan
Zheng, Xi
Dai, Hong-Ning
Jia, Weijia
Xie, Mande
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (04): : 2015 - 2024

← 1 2 3 4 5 →