Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data

被引:2
作者
Zhou, Tailin [1 ]
Lin, Zehong [2 ,4 ]
Zhang, Jun [2 ,4 ]
Tsang, Danny H. K. [3 ,4 ]
机构
[1] Hong Kong Univ Sci & Technol, Acad Interdisciplinary Studies, IPO, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Internet Things Thrust, Guangzhou 999077, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Federated learning; heterogeneous data; loss decomposition; loss landscape visualization; model averaging;
D O I
10.1109/TMC.2024.3406554
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model averaging is a widely adopted technique in federated learning (FL) that aggregates multiple client models to obtain a global model. Remarkably, model averaging in FL yields a superior global model, even when client models are trained with non-convex objective functions and on heterogeneous local datasets. However, the rationale behind its success remains poorly understood. To shed light on this issue, we first visualize the loss landscape of FL over client and global models to illustrate their geometric properties. The visualization shows that the client models encompass the global model within a common basin, and interestingly, the global model may deviate from the basin's center while still outperforming the client models. To gain further insights into model averaging in FL, we decompose the expected loss of the global model into five factors related to the client models. Specifically, our analysis reveals that the global model loss after early training mainly arises from i) the client model's loss on non-overlapping data between client datasets and the global dataset and ii) the maximum distance between the global and client models. Based on the findings from our loss landscape visualization and loss decomposition, we propose utilizing iterative moving averaging (IMA) on the global model at the late training phase to reduce its deviation from the expected minimum, while constraining client exploration to limit the maximum distance between the global and client models. Our experiments demonstrate that incorporating IMA into existing FL methods significantly improves their accuracy and training speed on various heterogeneous data setups of benchmark datasets.
引用
收藏
页码:12131 / 12145
页数:15
相关论文
共 44 条
[1]   Improving Generalization in Federated Learning by Seeking Flat Minima [J].
Caldarola, Debora ;
Caputo, Barbara ;
Ciccone, Marco .
COMPUTER VISION, ECCV 2022, PT XXIII, 2022, 13683 :654-672
[2]  
Cha J, 2021, ADV NEUR IN
[3]  
Draxler F, 2018, PR MACH LEARN RES, V80
[4]   FedDD: Toward Communication-Efficient Federated Learning With Differential Parameter Dropout [J].
Feng, Zhiying ;
Chen, Xu ;
Wu, Qiong ;
Wu, Wen ;
Zhang, Xiaoxi ;
Huang, Qianyi .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (05) :5366-5384
[5]  
Foret Pierre., 2021, P INT C LEARN REPR
[6]  
Garipov T, 2018, ADV NEUR IN, V31
[7]  
Goodfellow I.J., 2014, P INT C LEARN REPR S
[8]  
Gupta V., 2020, P INT C LEARN REPR A
[9]  
Izmailov P, 2018, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P876
[10]  
Jain P, 2018, J MACH LEARN RES, V18