Distributed Learning With Sparsified Gradient Differences

被引:10
|
作者
Chen, Yicheng [1 ]
Blum, Rick S. [1 ]
Takac, Martin [2 ]
Sadler, Brian M. [3 ]
机构
[1] Lehigh Univ, Bethlehem, PA 18015 USA
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi 51133, U Arab Emirates
[3] Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会;
关键词
Convergence; Servers; Optimization; Signal processing algorithms; Wireless communication; Distance learning; Computer aided instruction; Communication-efficient; distributed learning; error correction; gradient compression; sparsification; wireless communications; worker-server architecture; OPTIMIZATION; DESCENT; SCHEME;
D O I
10.1109/JSTSP.2022.3162989
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
引用
收藏
页码:585 / 600
页数:16
相关论文
共 50 条
  • [31] Accelerated Distributed Nesterov Gradient Descent
    Qu, Guannan
    Li, Na
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2020, 65 (06) : 2566 - 2581
  • [32] Standard Deviation Based Adaptive Gradient Compression For Distributed Deep Learning
    Chen, Mengqiang
    Yan, Zijie
    Ren, Jiangtao
    Wu, Weigang
    2020 20TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2020), 2020, : 529 - 538
  • [33] Model Pruning for Distributed Learning Over the Air
    Zhao, Zhongyuan
    Xu, Kailei
    Hong, Wei
    Peng, Mugen
    Ding, Zhiguo
    Quek, Tony Q. S.
    Yang, Howard H.
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5533 - 5549
  • [34] Enhancing Gradient Compression for Distributed Deep Learning
    Bai, Zhe
    Yu, Enda
    Dong, Dezun
    Lu, Pingjing
    PROCEEDINGS OF THE 8TH ASIA-PACIFIC WORKSHOP ON NETWORKING, APNET 2024, 2024, : 171 - 172
  • [35] Gradient and Channel Aware Dynamic Scheduling for Over-the-Air Computation in Federated Edge Learning Systems
    Du, Jun
    Jiang, Bingqing
    Jiang, Chunxiao
    Shi, Yuanming
    Han, Zhu
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 1035 - 1050
  • [36] Learned Gradient Compression for Distributed Deep Learning
    Abrahamyan, Lusine
    Chen, Yiming
    Bekoulis, Giannis
    Deligiannis, Nikos
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) : 7330 - 7344
  • [37] Joint Dynamic Grouping and Gradient Coding for Time-Critical Distributed Machine Learning in Heterogeneous Edge Networks
    Mao, Yingchi
    Wu, Jun
    He, Xiaoming
    Ping, Ping
    Wang, Jiajun
    Wu, Jie
    IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (22) : 22723 - 22736
  • [38] An accelerated distributed stochastic gradient method with momentum
    Huang, Kun
    Pu, Shi
    Nedic, Angelia
    MATHEMATICAL PROGRAMMING, 2025,
  • [39] Communication-Efficient Distributed Learning Over Networks-Part II: Necessary Conditions for Accuracy
    Liu, Zhenyu
    Conti, Andrea
    Mitter, Sanjoy K.
    Win, Moe Z.
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 1102 - 1119
  • [40] An accelerated distributed gradient method with local memory?
    Ren, Xiaoxing
    Li, Dewei
    Xi, Yugeng
    Shao, Haibin
    AUTOMATICA, 2022, 146