FedSA: A staleness-aware asynchronous Federated Learning algorithm with non-IID data

被引：56

作者：

Chen, Ming ^{[2
]}

Mao, Bingcheng ^{[2
]}

Ma, Tianyi ^{[1
,2
]}

机构：

[1] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou, Zhejiang, Peoples R China

[2] Hithink RoyalFlush Informat Network Co Ltd, Hangzhou, Zhejiang, Peoples R China

来源：

FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE | 2021年 / 120卷

关键词：

Federated Learning; Distributed machine learning; Mobile edge computing; Non-IID data;

D O I：

10.1016/j.future.2021.02.012

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper presents new asynchronous methods to the Federated Learning (FL), one of the next generation paradigms for Artificial Intelligence (AI) systems. We consider the two-fold challenges lay ahead. First, non-IID (non-Independent and Identically Distributed) data across devices cause unstable performance. Second, unreliable and slow environments not only slow the convergence but also cause staleness issues. To address these challenges, this study uses a bottom-up approach for analysis and algorithm design. We first reformulate FL by unifying both synchronous and asynchronous updating schemes with an asynchrony-related parameter. We theoretically analyze this new form and find practical strategies for optimization. The key findings include: 1) a two-stage training strategy to accelerate training and reduce communication overhead; 2) strategies of choosing key hyperparameters optimally for these stages to maintain efficiency and robustness. With these theoretical guarantees, we propose FedSA (Federated Staleness -Aware), a novel asynchronous federated learning algorithm. We validate FedSA on different tasks with non-IID/IID and staleness settings. Our results indicate that, given a large proportion of stale devices, the proposed algorithm presents state-of-the-art performance by outperforming existing methods on both non-IID and IID cases. (C) 2021 Elsevier B.V. All rights reserved.

引用

页码：1 / 12

页数：12

共 29 条

[1]

Agarwal Alekh, 2011, Advances in neural information processing systems, V24

[2] Large-Scale Machine Learning with Stochastic Gradient Descent [J].

Bottou, Leon .

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186

[3]

Caldas S., 2018, ARXIV PREPRINT ARXIV

[4]

Chai Z, 2019, PROCEEDINGS OF THE 2019 USENIX CONFERENCE ON OPERATIONAL MACHINE LEARNING, P19

[5] Multi-objective genetic algorithm for energy-efficient hybrid flow shop scheduling with lot streaming [J].

Chen, Tzu-Li ;

Cheng, Chen-Yang ;

Chou, Yi-Han .

ANNALS OF OPERATIONS RESEARCH, 2020, 290 (1-2) :813-836

[6]

Chenoweth JM, 2016, FLA MUS NAT HIST-RIP, P1

[7] GeePS: Scalable deep learning on distributed GPUs with a GPU-specialized parameter server [J].

Cui, Henggang ;

Zhang, Hao ;

Ganger, Gregory R. ;

Gibbons, Phillip B. ;

Xing, Eric P. .

PROCEEDINGS OF THE ELEVENTH EUROPEAN CONFERENCE ON COMPUTER SYSTEMS, (EUROSYS 2016), 2016,

[8]

Dai W., 2018, ARXIV PREPRINT ARXIV

[9]

Debbah M., 2018, ARXIV PREPRINT ARXIV

[10]

Gupta I., 2019, ARXIV PREPRINT ARXIV

← 1 2 3 →