Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers

被引:3
作者
Li, Chengxi [1 ]
Skoglund, Mikael [1 ]
机构
[1] KTH Royal Inst Technol, Sch Elect Engn & Comp Sci, Div Informat Sci & Engn, S-10044 Stockholm, Sweden
关键词
Vectors; Quantization (signal); Convergence; Training data; Training; Encoding; Costs; Distributed learning; 1-bit quantization; stragglers; communication overhead; convergence analysis;
D O I
10.1109/TCOMM.2024.3377715
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This paper considers the problem of distributed learning (DL) in the presence of stragglers. For this problem, DL methods based on gradient coding have been widely investigated, which redundantly distribute the training data to the workers to guarantee convergence when some workers are stragglers. However, these methods require the workers to transmit real-valued vectors during the process of learning, which induces very high communication burden. To overcome this drawback, we propose a novel DL method based on 1-bit gradient coding (1-bit GC-DL), where 1-bit data encoded from the locally computed gradients are transmitted by the workers to reduce the communication overhead. We theoretically provide the convergence guarantees of the proposed method for both the convex loss functions and non-convex loss functions. It is shown empirically that 1-bit GC-DL outperforms the baseline methods, which attains better learning performance under the same communication overhead.
引用
收藏
页码:4903 / 4916
页数:14
相关论文
共 49 条
[1]  
Alistarh D, 2017, ADV NEUR IN, V30
[2]  
Bernstein J, 2018, PR MACH LEARN RES, V80
[3]   Stochastic Gradient Coding for Straggler Mitigation in Distributed Learning [J].
Bitar, Rawad ;
Wootters, Mary ;
El Rouayheb, Salim .
IEEE JOURNAL ON SELECTED AREAS IN INFORMATION THEORY, 2020, 1 (01) :277-291
[4]   Gradient Coding With Dynamic Clustering for Straggler-Tolerant Distributed Learning [J].
Buyukates, Baturalp ;
Ozfatura, Emre ;
Ulukus, Sennur ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2023, 71 (06) :3317-3332
[5]   Distributed Learning With Sparsified Gradient Differences [J].
Chen, Yicheng ;
Blum, Rick S. ;
Takac, Martin ;
Sadler, Brian M. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (03) :585-600
[6]   Distributed Encoding and Updating for SAZD Coded Distributed Training [J].
Dai, Mingjun ;
Yuan, Jialong ;
Huang, Qingwen ;
Lin, Xiaohui ;
Wang, Hui .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (07) :2124-2137
[7]   In-Network Computation for Large-Scale Federated Learning Over Wireless Edge Networks [J].
Dinh, Thinh Quang ;
Nguyen, Diep N. ;
Hoang, Dinh Thai ;
Pham, Tran Vu ;
Dutkiewicz, Eryk .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (10) :5918-5932
[8]   High-Dimensional Stochastic Gradient Quantization for Communication-Efficient Edge Learning [J].
Du, Yuqing ;
Yang, Sheng ;
Huang, Kaibin .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 :2128-2142
[9]   Q-GADMM: Quantized Group ADMM for Communication Efficient Decentralized Machine Learning [J].
Elgabli, Anis ;
Park, Jihong ;
Bedi, Amrit Singh ;
Ben Issaid, Chaouki ;
Bennis, Mehdi ;
Aggarwal, Vaneet .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (01) :164-181
[10]   Approximate Gradient Coding with Optimal Decoding [J].
Glasgow, Margalit ;
Wootters, Mary .
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, :2280-2285