Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast

被引:0
作者
Luo, Shouxi [1 ]
Fan, Pingzhi [2 ]
Li, Ke [1 ]
Xing, Huanlai [1 ]
Luo, Long [3 ]
Yu, Hongfang [3 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Southwest Jiaotong Univ, CSNMT Int Cooperat Res Ctr, Key Lab Informat Coding & Transmiss, Chengdu 611756, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
关键词
Training; Synchronization; Receivers; Peer-to-peer computing; Convergence; Distance learning; Computer aided instruction; Bandwidth; Optimization; Multicast algorithms; Distributed learning; parameter synchronization; receiver selection;
D O I
10.1109/TSC.2024.3506480
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, i) the training still converges, even if only p workers take part in each round of synchronization, and ii) a larger p generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SELMCAST, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly p receivers for each worker's multicast in a bandwidth-agnostic way, SELMCAST chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average p values for faster convergence. Comprehensive evaluations show that SELMCAST is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.
引用
收藏
页码:156 / 168
页数:13
相关论文
共 40 条
[1]  
Barrak A, 2024, AAAI CONF ARTIF INTE, P23383
[2]   Synchronize Only the Immature Parameters: Communication-Efficient Federated Learning By Freezing Parameters Adaptively [J].
Chen, Chen ;
Xu, Hong ;
Wang, Wei ;
Li, Baochun ;
Li, Bo ;
Chen, Li ;
Zhang, Gong .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2024, 35 (07) :1155-1173
[3]  
Cormen ThomasH., 2009, INTRO ALGORITHMS, DOI DOI 10.2307/2583667
[4]  
Cui H., 2014, P USENIX ANN TECHN C, P37
[5]   Slow and Stale Gradients Can Win the Race [J].
Dutta S. ;
Wang J. ;
Joshi G. .
IEEE Journal on Selected Areas in Information Theory, 2021, 2 (03) :1012-1024
[6]  
GARFIELD E, 1982, CURR CONTENTS, P5
[7]   Decentralized learning works: An empirical comparison of gossip learning and federated learning [J].
Hegedus, Istvan ;
Danner, Gabor ;
Jelasity, Mark .
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING, 2021, 148 :109-124
[8]  
Ho Qirong, 2013, Adv Neural Inf Process Syst, V2013, P1223
[9]  
Hu Q, 2023, PROCEEDINGS OF THE 17TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, OSDI 2023, P757
[10]  
Li H., 2015, P 10 ACM EUR C COMP, P1