Distributed Learning With Sparsified Gradient Differences

被引:10
|
作者
Chen, Yicheng [1 ]
Blum, Rick S. [1 ]
Takac, Martin [2 ]
Sadler, Brian M. [3 ]
机构
[1] Lehigh Univ, Bethlehem, PA 18015 USA
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi 51133, U Arab Emirates
[3] Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会;
关键词
Convergence; Servers; Optimization; Signal processing algorithms; Wireless communication; Distance learning; Computer aided instruction; Communication-efficient; distributed learning; error correction; gradient compression; sparsification; wireless communications; worker-server architecture; OPTIMIZATION; DESCENT; SCHEME;
D O I
10.1109/JSTSP.2022.3162989
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
引用
收藏
页码:585 / 600
页数:16
相关论文
共 50 条
  • [1] Sparsified Random Partial Model Update for Personalized Federated Learning
    Hu, Xinyi
    Chen, Zihan
    Feng, Chenyuan
    Min, Geyong
    Quek, Tony Q. S.
    Yang, Howard H.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (04) : 3076 - 3091
  • [2] Design and Optimization of Hierarchical Gradient Coding for Distributed Learning at Edge Devices
    Tang, Weiheng
    Li, Jingyi
    Chen, Lin
    Chen, Xu
    IEEE TRANSACTIONS ON COMMUNICATIONS, 2024, 72 (12) : 7727 - 7741
  • [3] Byzantine-Robust Distributed Learning With Compression
    Zhu, Heng
    Ling, Qing
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 280 - 294
  • [4] Distributed Learning for Wireless Communications: Methods, Applications and Challenges
    Qian, Liangxin
    Yang, Ping
    Xiao, Ming
    Dobre, Octavia A.
    Di Renzo, Marco
    Li, Jun
    Han, Zhu
    Yi, Qin
    Zhao, Jiarong
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2022, 16 (03) : 326 - 342
  • [5] Robust Distributed Learning Against Both Distributional Shifts and Byzantine Attacks
    Zhou, Guanqiang
    Xu, Ping
    Wang, Yue
    Tian, Zhi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [6] Following the Correct Direction: Renovating Sparsified SGD Towards Global Optimization in Distributed Edge Learning
    Ning, Wanyi
    Sun, Haifeng
    Fu, Xiaoyuan
    Yang, Xiang
    Qi, Qi
    Wang, Jingyu
    Liao, Jianxin
    Han, Zhu
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (02) : 499 - 514
  • [7] Communication-Adaptive Stochastic Gradient Methods for Distributed Learning
    Chen, Tianyi
    Sun, Yuejiao
    Yin, Wotao
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 : 4637 - 4651
  • [8] AC-SGD: Adaptively Compressed SGD for Communication-Efficient Distributed Learning
    Yan, Guangfeng
    Li, Tan
    Huang, Shao-Lun
    Lan, Tian
    Song, Linqi
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2022, 40 (09) : 2678 - 2693
  • [9] Byzantine Resilient Non-Convex SCSG With Distributed Batch Gradient Computations
    Bulusu, Saikiran
    Khanduri, Prashant
    Kafle, Swatantra
    Sharma, Pranay
    Varshney, Pramod K.
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2021, 7 : 754 - 766
  • [10] Secure Distributed Optimization Under Gradient Attacks
    Yu, Shuhua
    Kar, Soummya
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2023, 71 : 1802 - 1816