Distributed Learning With Sparsified Gradient Differences

被引:10
|
作者
Chen, Yicheng [1 ]
Blum, Rick S. [1 ]
Takac, Martin [2 ]
Sadler, Brian M. [3 ]
机构
[1] Lehigh Univ, Bethlehem, PA 18015 USA
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi 51133, U Arab Emirates
[3] Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会;
关键词
Convergence; Servers; Optimization; Signal processing algorithms; Wireless communication; Distance learning; Computer aided instruction; Communication-efficient; distributed learning; error correction; gradient compression; sparsification; wireless communications; worker-server architecture; OPTIMIZATION; DESCENT; SCHEME;
D O I
10.1109/JSTSP.2022.3162989
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
引用
收藏
页码:585 / 600
页数:16
相关论文
共 50 条
  • [41] Weighted Average Consensus Algorithms in Distributed and Federated Learning
    Tedeschini, Bernardo Camajori
    Savazzi, Stefano
    Nicoli, Monica
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2025, 12 (02): : 1369 - 1382
  • [42] Distributed Stochastic Gradient Tracking Algorithm With Variance Reduction for Non-Convex Optimization
    Jiang, Xia
    Zeng, Xianlin
    Sun, Jian
    Chen, Jie
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (09) : 5310 - 5321
  • [43] Distributed Online Learning of Cooperative Caching in Edge Cloud
    Lyu, Xinchen
    Ren, Chenshan
    Ni, Wei
    Tian, Hui
    Liu, Ren Ping
    Tao, Xiaofeng
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (08) : 2550 - 2562
  • [44] Stochastic gradient compression for federated learning over wireless network
    Lin, Xiaohan
    Liu, Yuan
    Chen, Fangjiong
    Huang, Yang
    Ge, Xiaohu
    CHINA COMMUNICATIONS, 2024, 21 (04) : 230 - 247
  • [45] Stochastic Approximation Beyond Gradient for Signal Processing and Machine Learning
    Dieuleveut, Aymeric
    Fort, Gersende
    Moulines, Eric
    Wai, Hoi-To
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 3117 - 3148
  • [46] Linear Regression With Distributed Learning: A Generalization Error Perspective
    Hellkvist, Martin
    Ozcelikkale, Ayca
    Ahlen, Anders
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 5479 - 5495
  • [47] Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
    Deng, Xiaoge
    Li, Dongsheng
    Sun, Tao
    Lu, Xicheng
    IEEE TRANSACTIONS ON BIG DATA, 2025, 11 (01) : 234 - 246
  • [48] Distributed Learning Based on 1-Bit Gradient Coding in the Presence of Stragglers
    Li, Chengxi
    Skoglund, Mikael
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (08) : 4903 - 4916
  • [49] Distributed Deep Learning With Gradient Compression for Big Remote Sensing Image Interpretation
    Xie, Weiying
    Ma, Jitao
    Lu, Tianen
    Li, Yunsong
    Lei, Jie
    Fang, Leyuan
    Du, Qian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [50] Evaluation and Optimization of Gradient Compression for Distributed Deep Learning
    Zhang, Lin
    Zhang, Longteng
    Shi, Shaohuai
    Chu, Xiaowen
    Li, Bo
    2023 IEEE 43RD INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS, ICDCS, 2023, : 361 - 371