FedSA: A staleness-aware asynchronous Federated Learning algorithm with non-IID data

被引:53
作者
Chen, Ming [2 ]
Mao, Bingcheng [2 ]
Ma, Tianyi [1 ,2 ]
机构
[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China
[2] Hithink RoyalFlush Informat Network Co Ltd, Hangzhou, Zhejiang, Peoples R China
来源
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 120卷
关键词
Federated Learning; Distributed machine learning; Mobile edge computing; Non-IID data;
D O I
10.1016/j.future.2021.02.012
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
This paper presents new asynchronous methods to the Federated Learning (FL), one of the next generation paradigms for Artificial Intelligence (AI) systems. We consider the two-fold challenges lay ahead. First, non-IID (non-Independent and Identically Distributed) data across devices cause unstable performance. Second, unreliable and slow environments not only slow the convergence but also cause staleness issues. To address these challenges, this study uses a bottom-up approach for analysis and algorithm design. We first reformulate FL by unifying both synchronous and asynchronous updating schemes with an asynchrony-related parameter. We theoretically analyze this new form and find practical strategies for optimization. The key findings include: 1) a two-stage training strategy to accelerate training and reduce communication overhead; 2) strategies of choosing key hyperparameters optimally for these stages to maintain efficiency and robustness. With these theoretical guarantees, we propose FedSA (Federated Staleness -Aware), a novel asynchronous federated learning algorithm. We validate FedSA on different tasks with non-IID/IID and staleness settings. Our results indicate that, given a large proportion of stale devices, the proposed algorithm presents state-of-the-art performance by outperforming existing methods on both non-IID and IID cases. (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页码:1 / 12
页数:12
相关论文
共 29 条
  • [1] [Anonymous], 2011, Advances in NeurIPS
  • [2] Large-Scale Machine Learning with Stochastic Gradient Descent
    Bottou, Leon
    [J]. COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, : 177 - 186
  • [3] Caldas S., 2018, ARXIV PREPRINT ARXIV
  • [4] Chai Z, 2019, PROCEEDINGS OF THE 2019 USENIX CONFERENCE ON OPERATIONAL MACHINE LEARNING, P19
  • [5] Multi-objective genetic algorithm for energy-efficient hybrid flow shop scheduling with lot streaming
    Chen, Tzu-Li
    Cheng, Chen-Yang
    Chou, Yi-Han
    [J]. ANNALS OF OPERATIONS RESEARCH, 2020, 290 (1-2) : 813 - 836
  • [6] Chenoweth JM, 2016, FLA MUS NAT HIST-RIP, P1
  • [7] GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server
    Cui, Henggang
    Zhang, Hao
    Ganger, Gregory R.
    Gibbons, Phillip B.
    Xing, Eric P.
    [J]. PROCEEDINGS OF THE ELEVENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, (EUROSYS 2016), 2016,
  • [8] Hadjis Stefan, 2016, ARXIV PREPRINT ARXIV
  • [9] Hakimi Ido, 2019, ARXIV PREPRINT ARXIV
  • [10] Hsieh K., 2019, ARXIV PREPRINT ARXIV