Distributed Learning With Sparsified Gradient Differences

被引：10

作者：

Chen, Yicheng ^{[1
]}

Blum, Rick S. ^{[1
]}

Takac, Martin ^{[2
]}

Sadler, Brian M. ^{[3
]}

机构：

[1] Lehigh Univ, Bethlehem, PA 18015 USA

[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi 51133, U Arab Emirates

[3] Army Res Lab, Adelphi, MD 20783 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING | 2022年 / 16卷 / 03期

基金：

美国国家科学基金会;

关键词：

Convergence; Servers; Optimization; Signal processing algorithms; Wireless communication; Distance learning; Computer aided instruction; Communication-efficient; distributed learning; error correction; gradient compression; sparsification; wireless communications; worker-server architecture; OPTIMIZATION; DESCENT; SCHEME;

D O I：

10.1109/JSTSP.2022.3162989

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.

引用

页码：585 / 600

页数：16

共 50 条

[41] Weighted Average Consensus Algorithms in Distributed and Federated Learning
Tedeschini, Bernardo Camajori
Savazzi, Stefano
Nicoli, Monica
IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2025, 12 (02): : 1369 - 1382
[42] Distributed Stochastic Gradient Tracking Algorithm With Variance Reduction for Non-Convex Optimization
Jiang, Xia
Zeng, Xianlin
Sun, Jian
Chen, Jie
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5310 - 5321
[43] Distributed Online Learning of Cooperative Caching in Edge Cloud
Lyu, Xinchen
Ren, Chenshan
Ni, Wei
Tian, Hui
Liu, Ren Ping
Tao, Xiaofeng
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (08) : 2550 - 2562
[44] Stochastic gradient compression for federated learning over wireless network
Lin, Xiaohan
Liu, Yuan
Chen, Fangjiong
Huang, Yang
Ge, Xiaohu
CHINA COMMUNICATIONS, 2024, 21 (04) : 230 - 247
[45] Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning
Dieuleveut, Aymeric
Fort, Gersende
Moulines, Eric
Wai, Hoi-To
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 3117 - 3148
[46] Linear Regression With Distributed Learning: A Generalization Error Perspective
Hellkvist, Martin
Ozcelikkale, Ayca
Ahlen, Anders
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 5479 - 5495
[47] Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
Deng, Xiaoge
Li, Dongsheng
Sun, Tao
Lu, Xicheng
IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (01) : 234 - 246
[48] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
Li, Chengxi
Skoglund, Mikael
IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
[49] Distributed Deep Learning With Gradient Compression for Big Remote Sensing Image Interpretation
Xie, Weiying
Ma, Jitao
Lu, Tianen
Li, Yunsong
Lei, Jie
Fang, Leyuan
Du, Qian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[50] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
Zhang, Lin
Zhang, Longteng
Shi, Shaohuai
Chu, Xiaowen
Li, Bo
2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 361 - 371

← 1 2 3 4 5 →