Decentralized Federated Averaging

被引:117
作者
Sun, Tao [1 ]
Li, Dongsheng [1 ]
Wang, Bao [2 ,3 ]
机构
[1] Natl Univ Def Technol, Coll Comp, Changsha 410073, Hunan, Peoples R China
[2] Univ Utah, Sci Comp & Imaging Inst, Salt Lake City, UT 84112 USA
[3] Univ Utah, Dept Math, Salt Lake City, UT 84112 USA
基金
美国国家科学基金会; 国家重点研发计划;
关键词
Servers; Convergence; Costs; Training; Collaborative work; Peer-to-peer computing; Privacy; Decentralized optimization; federated averaging; momentum; stochastic gradient descent; GRADIENT; OPTIMIZATION; CONVERGENCE; ALGORITHMS; CONSENSUS;
D O I
10.1109/TPAMI.2022.3196503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated averaging (FedAvg) is a communication-efficient algorithm for distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, requiring massive communications between the central server and clients, which leads to possible channel blocking. Moreover, attacking the central server can break the whole system's privacy. Indeed, decentralization can significantly reduce the communication of the busiest node (the central one) because all nodes only communicate with their neighbors. To this end, in this paper, we study the decentralized FedAvg with momentum (DFedAvgM), implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. The proposed algorithm involves the mixing matrix, momentum, client training with multiple local iterations, and quantization, introducing extra items in the Lyapunov analysis. Thus, the analysis of this paper is much more challenging than previous decentralized (momentum) SGD or FedAvg. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved to sublinear when the loss function satisfies the PL property. Numerically, we find that the proposed algorithm outperforms FedAvg in both convergence speed and communication cost.
引用
收藏
页码:4289 / 4301
页数:13
相关论文
共 59 条
[1]  
Alistarh D, 2017, ADV NEUR IN, V30
[2]  
[Anonymous], 2017, P 34 INT C MACH LEAR
[3]  
[Anonymous], 2010, Advances in Neural Information Processing Systems
[4]   Broadcast Gossip Algorithms for Consensus [J].
Aysal, Tuncer Can ;
Yildiz, Mehmet Ercan ;
Sarwate, Anand D. ;
Scaglione, Anna .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2009, 57 (07) :2748-2761
[5]  
Boyd S, 2005, IEEE INFOCOM SER, P1653
[6]  
Boyd S, 2004, SIAM REV, V46, P667, DOI [10.1137/S0036144503423264, 10.1137/s0036144503423264]
[7]  
Chen AI, 2012, ANN ALLERTON CONF, P601, DOI 10.1109/Allerton.2012.6483273
[8]   Multi-objective genetic algorithm for energy-efficient hybrid flow shop scheduling with lot streaming [J].
Chen, Tzu-Li ;
Cheng, Chen-Yang ;
Chou, Yi-Han .
ANNALS OF OPERATIONS RESEARCH, 2020, 290 (1-2) :813-836
[9]  
Foster DJ, 2018, ADV NEUR IN, V31
[10]  
Hsu TMH, 2019, Arxiv, DOI [arXiv:1909.06335, DOI 10.48550/ARXIV.1909.06335]