Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data

被引：2

作者：

Zhou, Tailin ^{[1
]}

Lin, Zehong ^{[2
,4
]}

Zhang, Jun ^{[2
,4
]}

Tsang, Danny H. K. ^{[3
,4
]}

机构：

[1] Hong Kong Univ Sci & Technol, Acad Interdisciplinary Studies, IPO, Hong Kong, Peoples R China

[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

[3] Hong Kong Univ Sci & Technol, Internet Things Thrust, Guangzhou 999077, Peoples R China

[4] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON MOBILE COMPUTING | 2024年 / 23卷 / 12期

关键词：

Federated learning; heterogeneous data; loss decomposition; loss landscape visualization; model averaging;

D O I：

10.1109/TMC.2024.3406554

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Model averaging is a widely adopted technique in federated learning (FL) that aggregates multiple client models to obtain a global model. Remarkably, model averaging in FL yields a superior global model, even when client models are trained with non-convex objective functions and on heterogeneous local datasets. However, the rationale behind its success remains poorly understood. To shed light on this issue, we first visualize the loss landscape of FL over client and global models to illustrate their geometric properties. The visualization shows that the client models encompass the global model within a common basin, and interestingly, the global model may deviate from the basin's center while still outperforming the client models. To gain further insights into model averaging in FL, we decompose the expected loss of the global model into five factors related to the client models. Specifically, our analysis reveals that the global model loss after early training mainly arises from i) the client model's loss on non-overlapping data between client datasets and the global dataset and ii) the maximum distance between the global and client models. Based on the findings from our loss landscape visualization and loss decomposition, we propose utilizing iterative moving averaging (IMA) on the global model at the late training phase to reduce its deviation from the expected minimum, while constraining client exploration to limit the maximum distance between the global and client models. Our experiments demonstrate that incorporating IMA into existing FL methods significantly improves their accuracy and training speed on various heterogeneous data setups of benchmark datasets.

引用

页码：12131 / 12145

页数：15

共 41 条

[21] CFLIT: Coexisting Federated Learning and Information Transfer
Lin, Zehong
Liu, Hang
Zhang, Ying-Jun Angela
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (11) : 8436 - 8453
[22] McMahan HB, 2017, PR MACH LEARN RES, V54, P1273
[23] Toward Multiple Federated Learning Services Resource Sharing in Mobile Edge Networks
Nguyen, Minh N. H.
Tran, Nguyen H.
Tun, Yan Kyaw
Han, Zhu
Hong, Choong Seon
[J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (01) : 541 - 555
[24] Panda A, 2022, PR MACH LEARN RES, V151
[25] ACCELERATION OF STOCHASTIC-APPROXIMATION BY AVERAGING
POLYAK, BT
JUDITSKY, AB
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1992, 30 (04) : 838 - 855
[26] Rame Alexandre, 2022, ADV NEUR IN
[27] Reddi S., 2021, INT C LEARN REPR
[28] Shao Jiawei, 2022, PROC 36 ADV NEURAL I
[29] UVeQFed: Universal Vector Quantization for Federated Learning
Shlezinger, Nir
Chen, Mingzhe
Eldar, Yonina C.
Poor, H. Vincent
Cui, Shuguang
[J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 (69) : 500 - 514
[30] The Global Landscape of Neural Networks: An Overview
Sun, Ruoyu
Li, Dawei
Liang, Shiyu
Ding, Tian
Srikant, Rayadurgam
[J]. IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (05) : 95 - 108

← 1 2 3 4 5 →