Efficient Parameter Synchronization for Peer-to-Peer Distributed Learning With Selective Multicast

被引:0
作者
Luo, Shouxi [1 ]
Fan, Pingzhi [2 ]
Li, Ke [1 ]
Xing, Huanlai [1 ]
Luo, Long [3 ]
Yu, Hongfang [3 ]
机构
[1] Southwest Jiaotong Univ, Sch Comp & Artificial Intelligence, Chengdu 611756, Peoples R China
[2] Southwest Jiaotong Univ, CSNMT Int Cooperat Res Ctr, Key Lab Informat Coding & Transmiss, Chengdu 611756, Peoples R China
[3] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu 611731, Peoples R China
关键词
Training; Synchronization; Receivers; Peer-to-peer computing; Convergence; Distance learning; Computer aided instruction; Bandwidth; Optimization; Multicast algorithms; Distributed learning; parameter synchronization; receiver selection;
D O I
10.1109/TSC.2024.3506480
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent advances in distributed machine learning show theoretically and empirically that, for many models, provided that workers will eventually participate in the synchronizations, i) the training still converges, even if only p workers take part in each round of synchronization, and ii) a larger p generally leads to a faster rate of convergence. These findings shed light on eliminating the bottleneck effects of parameter synchronization in large-scale data-parallel distributed training and have motivated several optimization designs. In this paper, we focus on optimizing the parameter synchronization for peer-to-peer distributed learning, where workers broadcast or multicast their updated parameters to others for synchronization, and propose SELMCAST, a suite of expressive and efficient multicast receiver selection algorithms, to achieve the goal. Compared with the state-of-the-art (SOTA) design, which randomly selects exactly p receivers for each worker's multicast in a bandwidth-agnostic way, SELMCAST chooses receivers based on the global view of their available bandwidth and loads, yielding two advantages, i.e., accelerated parameter synchronization for higher utilization of computing resources and enlarged average p values for faster convergence. Comprehensive evaluations show that SELMCAST is efficient for both peer-to-peer Bulk Synchronous Parallel (BSP) and Stale Synchronous Parallel (SSP) distributed training, outperforming the SOTA solution significantly.
引用
收藏
页码:156 / 168
页数:13
相关论文
共 40 条
[31]  
Sapio A, 2021, PROCEEDINGS OF THE 18TH USENIX SYMPOSIUM ON NETWORKED SYSTEM DESIGN AND IMPLEMENTATION, P785
[32]   A Machine Learning Approach for Blockchain-Based Smart Home Networks Security [J].
Khan, Muhammad Adnan ;
Abbas, Sagheer ;
Rehman, Abdur ;
Saeed, Yousaf ;
Zeb, Asim ;
Uddin, M. Irfan ;
Nasser, Nidal ;
Ali, Asmaa .
IEEE NETWORK, 2021, 35 (03) :223-229
[33]   A Distributed Synchronous SGD Algorithm with Global Top-k Sparsification for Low Bandwidth Networks [J].
Shi, Shaohuai ;
Wang, Qiang ;
Zhao, Kaiyong ;
Tang, Zhenheng ;
Wang, Yuxin ;
Huang, Xiang ;
Chu, Xiaowen .
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, :2238-2247
[34]   A Survey on Distributed Machine Learning [J].
Verbraeken, Joost ;
Wolting, Matthijs ;
Katzy, Jonathan ;
Kloppenburg, Jeroen ;
Verbelen, Tim ;
Rellermeyer, Jan S. .
ACM COMPUTING SURVEYS, 2020, 53 (02)
[35]   Rethinking Transport Layer Design for Distributed Machine Learning [J].
Xia, Jiacheng ;
Zeng, Gaoxiong ;
Zhang, Junxue ;
Wang, Weiyan ;
Bai, Wei ;
Jiang, Junchen ;
Chen, Kai .
PROCEEDINGS OF THE 2019 ASIA-PACIFIC WORKSHOP ON NETWORKING (APNET '19), 2019, :22-28
[36]  
Xie P., 2016, P 32 C UNC ART INT, P795
[37]   Orpheus: Efficient Distributed Machine Learning via System and Algorithm Co-design [J].
Xie, Pengtao ;
Kim, Jin Kyu ;
Ho, Qirong ;
Yu, Yaoliang ;
Xing, Eric .
PROCEEDINGS OF THE 2018 ACM SYMPOSIUM ON CLOUD COMPUTING (SOCC '18), 2018, :1-13
[38]   Near-Optimal Topology-adaptive Parameter Synchronization in Distributed DNN Training [J].
Zhang, Zhe ;
Wu, Chuan ;
Li, Zongpeng .
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021), 2021,
[39]   Dynamic Stale Synchronous Parallel Distributed Training for Deep Learning [J].
Zhao, Xing ;
An, Aijun ;
Liu, Junfeng ;
Chen, Bao Xin .
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, :1507-1517
[40]   DGT: A contribution-aware differential gradient transmission mechanism for distributed machine learning [J].
Zhou, Huaman ;
Li, Zonghang ;
Cai, Qingqing ;
Yu, Hongfang ;
Luo, Shouxi ;
Luo, Long ;
Sun, Gang .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2021, 121 :35-47