Lazily Aggregated Quantized Gradient Innovation for Communication-Efficient Federated Learning

被引：61

作者：

Sun, Jun ^{[1
]}

Chen, Tianyi ^{[2
]}

Giannakis, Georgios B. ^{[3
,4
]}

Yang, Qinmin ^{[1
]}

Yang, Zaiyue ^{[5
]}

机构：

[1] Zhejiang Univ, Coll Control Sci & Engn, State Key Lab Ind Control Technol, Hangzhou 310027, Peoples R China

[2] Rensselaer Polytech Inst, Dept Elect Comp & Syst Engn, Troy, NY 12180 USA

[3] Univ Minnesota, Dept Elect & Comp Engn, Minneapolis, MN 55455 USA

[4] Univ Minnesota, Digital Technol Ctr, Minneapolis, MN 55455 USA

[5] Southern Univ Sci & Technol, Dept Mech & Energy Engn, Shenzhen 518055, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Quantization (signal); Servers; Technological innovation; Convergence; Frequency modulation; Distributed databases; Collaborative work; Federated learning; communication-efficient; gradient innovation; quantization;

D O I：

10.1109/TPAMI.2020.3033286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper focuses on communication-efficient federated learning problem, and develops a novel distributed quantized gradient approach, which is characterized by adaptive communications of the quantized gradients. Specifically, the federated learning builds upon the server-worker infrastructure, where the workers calculate local gradients and upload them to the server; then the server obtain the global gradient by aggregating all the local gradients and utilizes it to update the model parameter. The key idea to save communications from the worker to the server is to quantize gradients as well as skip less informative quantized gradient communications by reusing previous gradients. Quantizing and skipping result in 'lazy' worker-server communications, which justifies the term Lazily Aggregated Quantized (LAQ) gradient. Theoretically, the LAQ algorithm achieves the same linear convergence as the gradient descent in the strongly convex case, while effecting major savings in the communication in terms of transmitted bits and communication rounds. Empirically, extensive experiments using realistic data corroborate a significant communication reduction compared with state-of-the-art gradient- and stochastic gradient-based algorithms.

引用

页码：2031 / 2044

页数：14

共 40 条

[1] Aji A. F., 2017, C EMP METH NAT LANG, P440, DOI DOI 10.18653/V1/D17-1045
[2] Alistarh D, 2018, ADV NEUR IN, V31
[3] Alistarh D, 2017, ADV NEUR IN, V30
[4] [Anonymous], 2019, ARXIV190211163
[5] Arjevani Y, 2015, ADV NEUR IN, V28
[6] Bernstein J, 2018, PR MACH LEARN RES, V80
[7] Chen T., 2018, arXiv:1812.03239
[8] Multi-objective genetic algorithm for energy-efficient hybrid flow shop scheduling with lot streaming
Chen, Tzu-Li
Cheng, Chen-Yang
Chou, Yi-Han
[J]. ANNALS OF OPERATIONS RESEARCH, 2020, 290 (1-2) : 813 - 836
[9] ON THE CONVERGENCE RATE OF INCREMENTAL AGGREGATED GRADIENT ALGORITHMS
Gurbuzbalaban, M.
Ozdaglar, A.
Parrilo, P. A.
[J]. SIAM JOURNAL ON OPTIMIZATION, 2017, 27 (02) : 1035 - 1048
[10] Jiang P, 2018, ADV NEUR IN, V31

← 1 2 3 4 →