Distributed Learning With Sparsified Gradient Differences

被引:10
|
作者
Chen, Yicheng [1 ]
Blum, Rick S. [1 ]
Takac, Martin [2 ]
Sadler, Brian M. [3 ]
机构
[1] Lehigh Univ, Bethlehem, PA 18015 USA
[2] Mohamed Bin Zayed Univ Artificial Intelligence, Abu Dhabi 51133, U Arab Emirates
[3] Army Res Lab, Adelphi, MD 20783 USA
基金
美国国家科学基金会;
关键词
Convergence; Servers; Optimization; Signal processing algorithms; Wireless communication; Distance learning; Computer aided instruction; Communication-efficient; distributed learning; error correction; gradient compression; sparsification; wireless communications; worker-server architecture; OPTIMIZATION; DESCENT; SCHEME;
D O I
10.1109/JSTSP.2022.3162989
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
A very large number of communications are typically required to solve distributed learning tasks, and this critically limits scalability and convergence speed in wireless communications applications. In this paper, we devise a Gradient Descent method with Sparsification and Error Correction (GD-SEC) to improve the communications efficiency in a general worker-server architecture. Motivated by a variety of wireless communications learning scenarios, GD-SEC reduces the number of bits per communication from worker to server with no degradation in the order of the convergence rate. This enables larger scale model learning without sacrificing convergence or accuracy. At each iteration of GD-SEC, instead of directly transmitting the entire gradient vector, each worker computes the difference between its current gradient and a linear combination of its previously transmitted gradients, and then transmits the sparsified gradient difference to the server. A key feature of GD-SEC is that any given component of the gradient difference vector will not be transmitted if its magnitude is not sufficiently large. An error correction technique is used at each worker to compensate for the error resulting from sparsification. We prove that GD-SEC is guaranteed to converge for strongly convex, convex, and nonconvex optimization problems with the same order of convergence rate as GD. Furthermore, if the objective function is strongly convex, GD-SEC has a fast linear convergence rate. Numerical results not only validate the convergence rate of GD-SEC but also explore the communication bit savings it provides. Given a target accuracy, GD-SEC can significantly reduce the communications load compared to the best existing algorithms without slowing down the optimization process.
引用
收藏
页码:585 / 600
页数:16
相关论文
共 50 条
  • [21] Distributed Gradient Descent Algorithm Robust to an Arbitrary Number of Byzantine Attackers
    Cao, Xinyang
    Lai, Lifeng
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2019, 67 (22) : 5850 - 5864
  • [22] Communication-Efficient Distributed Learning: An Overview
    Cao, Xuanyu
    Basar, Tamer
    Diggavi, Suhas
    Eldar, Yonina C.
    Letaief, Khaled B.
    Poor, H. Vincent
    Zhang, Junshan
    IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2023, 41 (04) : 851 - 873
  • [23] EGC: Entropy-based gradient compression for distributed deep learning
    Xiao, Danyang
    Mei, Yuan
    Kuang, Di
    Chen, Mengqiang
    Guo, Binbin
    Wu, Weigang
    INFORMATION SCIENCES, 2021, 548 : 118 - 134
  • [24] Federated Generalized Bayesian Learning via Distributed Stein Variational Gradient Descent
    Kassab, Rahif
    Simeone, Osvaldo
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 2180 - 2192
  • [25] Online Learning Over Dynamic Graphs via Distributed Proximal Gradient Algorithm
    Dixit, Rishabh
    Bedi, Amrit Singh
    Rajawat, Ketan
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2021, 66 (11) : 5065 - 5079
  • [26] Distributed Stochastic Gradient Method for Non-Convex Problems with Applications in Supervised Learning
    George, Jemin
    Yang, Tao
    Bai, He
    Gurram, Prudhvi
    2019 IEEE 58TH CONFERENCE ON DECISION AND CONTROL (CDC), 2019, : 5538 - 5543
  • [27] Distributed Gradient Descent for Functional Learning
    Yu, Zhan
    Fan, Jun
    Shi, Zhongjie
    Zhou, Ding-Xuan
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (09) : 6547 - 6571
  • [28] Stochastic Gradient Coding for Flexible Straggler Mitigation in Distributed Learning
    Bitar, Rawad
    Wootters, Mary
    El Rouayheb, Salim
    2019 IEEE INFORMATION THEORY WORKSHOP (ITW), 2019, : 394 - 398
  • [29] Task-Aware Service Placement for Distributed Learning in Wireless Edge Networks
    Cong, Rong
    Zhao, Zhiwei
    Wang, Mengfan
    Min, Geyong
    Liu, Jiangshu
    Mo, Jiwei
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2025, 36 (04) : 731 - 744
  • [30] Distributed Online Learning With Multiple Kernels
    Hong, Songnam
    Chae, Jeongmin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (03) : 1263 - 1277