Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices

被引:0
|
作者
Tang, Weiheng [1 ]
Li, Jingyi [1 ]
Chen, Lin [2 ]
Chen, Xu [1 ]
机构
[1] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangzhou 510275, Peoples R China
[2] Sun Yat Sen Univ, Sch Comp Sci & Engn, Guangdong Prov Key Lab Informat Secur Technol, Guangzhou 510275, Peoples R China
基金
美国国家科学基金会;
关键词
Encoding; Distance learning; Computer aided instruction; Computational modeling; Task analysis; Optimization; Computer architecture; Distributed learning; hierarchical architecture; stragglers tolerance; gradient coding; ALLOCATION;
D O I
10.1109/TCOMM.2024.3418901
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Edge computing has recently emerged as a promising paradigm to boost the performance of distributed learning by leveraging the distributed resources at edge nodes. Architecturally, the introduction of edge nodes adds an additional intermediate layer between the master and workers in the original distributed learning systems, potentially leading to more severe straggler effect. Recently, coding theory-based approaches have been proposed for stragglers mitigation in distributed learning, but the majority focus on the conventional workers-master architecture. In this paper, along a different line, we investigate the problem of mitigating the straggler effect in hierarchical distributed learning systems with an additional layer composed of edge nodes. Technically, we first derive the fundamental trade-off between the computational loads of workers and the stragglers tolerance. Then, we propose a hierarchical gradient coding framework, which provides better stragglers mitigation, to achieve the derived computational trade-off. To further improve the performance of our framework in heterogeneous scenarios, we formulate an optimization problem with the objective of minimizing the expected execution time for each iteration in the learning process. We develop an efficient algorithm to mathematically solve the problem by outputting the optimum strategy. Extensive simulation results demonstrate the superiority of our schemes compared with conventional solutions.
引用
收藏
页码:7727 / 7741
页数:15
相关论文
共 50 条
  • [1] Joint Coding and Scheduling Optimization for Distributed Learning Over Wireless Edge Networks
    Nguyen Van Huynh
    Dinh Thai Hoang
    Nguyen, Diep N.
    Dutkiewicz, Eryk
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (02) : 484 - 498
  • [2] Hierarchical Broadcast Coding: Expediting Distributed Learning at the Wireless Edge
    Han, Dong-Jun
    Sohn, Jy-Yong
    Moon, Jaekyun
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (04) : 2266 - 2281
  • [3] Optimization-Based Block Coordinate Gradient Coding for Mitigating Partial Stragglers in Distributed Learning
    Wang, Qi
    Cui, Ying
    Li, Chenglin
    Zou, Junni
    Xiong, Hongkai
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 1023 - 1038
  • [4] Distributed Learning With Sparsified Gradient Differences
    Chen, Yicheng
    Blum, Rick S.
    Takac, Martin
    Sadler, Brian M.
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (03) : 585 - 600
  • [5] Joint Dynamic Grouping and Gradient Coding for Time-Critical Distributed Machine Learning in Heterogeneous Edge Networks
    Mao, Yingchi
    Wu, Jun
    He, Xiaoming
    Ping, Ping
    Wang, Jiajun
    Wu, Jie
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22723 - 22736
  • [6] Distributed Learning for Automatic Modulation Classification in Edge Devices
    Wang, Yu
    Guo, Liang
    Zhao, Yu
    Yang, Jie
    Adebisi, Bamidele
    Gacanin, Haris
    Gui, Guan
    IEEE WIRELESS COMMUNICATIONS LETTERS, 2020, 9 (12) : 2177 - 2181
  • [7] LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
    Zhang, Jingjing
    Simeone, Osvaldo
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2021, 32 (03) : 962 - 974
  • [8] CodedReduce: A Fast and Robust Framework for Gradient Aggregation in Distributed Learning
    Reisizadeh, Amirhossein
    Prakash, Saurav
    Pedarsani, Ramtin
    Avestimehr, Amir Salman
    IEEE-ACM TRANSACTIONS ON NETWORKING, 2022, 30 (01) : 148 - 161
  • [9] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
    Li, Chengxi
    Skoglund, Mikael
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
  • [10] Edge-Based Communication Optimization for Distributed Federated Learning
    Wang, Tian
    Liu, Yan
    Zheng, Xi
    Dai, Hong-Ning
    Jia, Weijia
    Xie, Mande
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2022, 9 (04): : 2015 - 2024