Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data

被引:2
作者
Zhou, Tailin [1 ]
Lin, Zehong [2 ,4 ]
Zhang, Jun [2 ,4 ]
Tsang, Danny H. K. [3 ,4 ]
机构
[1] Hong Kong Univ Sci & Technol, Acad Interdisciplinary Studies, IPO, Hong Kong, Peoples R China
[2] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
[3] Hong Kong Univ Sci & Technol, Internet Things Thrust, Guangzhou 999077, Peoples R China
[4] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Hong Kong, Peoples R China
关键词
Federated learning; heterogeneous data; loss decomposition; loss landscape visualization; model averaging;
D O I
10.1109/TMC.2024.3406554
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Model averaging is a widely adopted technique in federated learning (FL) that aggregates multiple client models to obtain a global model. Remarkably, model averaging in FL yields a superior global model, even when client models are trained with non-convex objective functions and on heterogeneous local datasets. However, the rationale behind its success remains poorly understood. To shed light on this issue, we first visualize the loss landscape of FL over client and global models to illustrate their geometric properties. The visualization shows that the client models encompass the global model within a common basin, and interestingly, the global model may deviate from the basin's center while still outperforming the client models. To gain further insights into model averaging in FL, we decompose the expected loss of the global model into five factors related to the client models. Specifically, our analysis reveals that the global model loss after early training mainly arises from i) the client model's loss on non-overlapping data between client datasets and the global dataset and ii) the maximum distance between the global and client models. Based on the findings from our loss landscape visualization and loss decomposition, we propose utilizing iterative moving averaging (IMA) on the global model at the late training phase to reduce its deviation from the expected minimum, while constraining client exploration to limit the maximum distance between the global and client models. Our experiments demonstrate that incorporating IMA into existing FL methods significantly improves their accuracy and training speed on various heterogeneous data setups of benchmark datasets.
引用
收藏
页码:12131 / 12145
页数:15
相关论文
共 41 条
  • [21] CFLIT: Coexisting Federated Learning and Information Transfer
    Lin, Zehong
    Liu, Hang
    Zhang, Ying-Jun Angela
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (11) : 8436 - 8453
  • [22] McMahan HB, 2017, PR MACH LEARN RES, V54, P1273
  • [23] Toward Multiple Federated Learning Services Resource Sharing in Mobile Edge Networks
    Nguyen, Minh N. H.
    Tran, Nguyen H.
    Tun, Yan Kyaw
    Han, Zhu
    Hong, Choong Seon
    [J]. IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (01) : 541 - 555
  • [24] Panda A, 2022, PR MACH LEARN RES, V151
  • [25] ACCELERATION OF STOCHASTIC-APPROXIMATION BY AVERAGING
    POLYAK, BT
    JUDITSKY, AB
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 1992, 30 (04) : 838 - 855
  • [26] Rame Alexandre, 2022, ADV NEUR IN
  • [27] Reddi S., 2021, INT C LEARN REPR
  • [28] Shao Jiawei, 2022, PROC 36 ADV NEURAL I
  • [29] UVeQFed: Universal Vector Quantization for Federated Learning
    Shlezinger, Nir
    Chen, Mingzhe
    Eldar, Yonina C.
    Poor, H. Vincent
    Cui, Shuguang
    [J]. IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 (69) : 500 - 514
  • [30] The Global Landscape of Neural Networks: An Overview
    Sun, Ruoyu
    Li, Dawei
    Liang, Shiyu
    Ding, Tian
    Srikant, Rayadurgam
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2020, 37 (05) : 95 - 108