FedSA: A Semi-Asynchronous Federated Learning Mechanism in Heterogeneous Edge Computing

被引：130

作者：

Ma, Qianpiao ^{[1
,2
]}

Xu, Yang ^{[1
,2
]}

Xu, Hongli ^{[1
,2
]}

Jiang, Zhida ^{[1
,2
]}

Huang, Liusheng ^{[1
,2
]}

Huang, He ^{[3
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China

[2] Univ Sci & Technol China, Suzhou Inst Adv Study, Suzhou 215123, Jiangsu, Peoples R China

[3] Soochow Univ, Sch Comp Sci & Technol, Suzhou 215123, Peoples R China

来源：

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS | 2021年 / 39卷 / 12期

基金：

美国国家科学基金会;

关键词：

Training; Servers; Computational modeling; Data models; Collaborative work; Analytical models; Edge computing; federated learning; semi-asynchronous mechanism; heterogeneity; non-IID;

D O I：

10.1109/JSAC.2021.3118435

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Federated learning (FL) involves training machine learning models over distributed edge nodes (i.e., workers) while facing three critical challenges, edge heterogeneity, Non-IID data and communication resource constraint. In the synchronous FL, the parameter server has to wait for the slowest workers, leading to significant waiting time due to edge heterogeneity. Though asynchronous FL can well tackle the edge heterogeneity, it requires frequent model transfers, resulting in massive communication resource consumption. Moreover, the different relative frequency of workers participating in asynchronous updating may seriously hurt training accuracy, especially on Non-IID data. In this paper, we propose a semi-asynchronous federated learning mechanism (FedSA), where the parameter server aggregates a certain number of local models by their arrival order in each round. We theoretically analyze the quantitative relationship between the convergence bound of FedSA and different factors, e.g., the number of participating workers in each round, the degree of data Non-IID and edge heterogeneity. Based on the convergence bound, we present an efficient algorithm to determine the number of participating workers to minimize the training completion time. To further improve the training accuracy on Non-IID data, FedSA deploys adaptive learning rates for workers by their relative participation frequency. We extend our proposed mechanism to the dynamic and multiple learning tasks scenarios. Experimental results on the testbed show that our proposed mechanism and algorithms address the three challenges more effectively than the state-of-the-art solutions.

引用

页码：3654 / 3672

页数：19

共 37 条

[1]

Anguita D., 2013, P EUR S ART NEUR NET

[2]

[Anonymous], 2014, UNDERSTANDING MACHIN, DOI DOI 10.1017/CBO9781107298019

[3]

Baytas IM, 2016, IEEE DATA MINING, P11, DOI [10.1109/ICDM.2016.0012, 10.1109/ICDM.2016.61]

[4]

Chai Z., 2020, ARXIV201005958

[5]

Chen Y., 2019, ASYNCHRONOUS ONLINE

[6]

Feyzmahdavian H., 2014, 2014 IEEE international workshop on machine learning for signal processing (MLSP), P1

[7] Time Efficient Federated Learning with Semi-asynchronous Communication [J].

Hao, Jiangshan ;

Zhao, Yanchao ;

Zhang, Jiale .

2020 IEEE 26TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED SYSTEMS (ICPADS), 2020, :156-163

[8]

Hosmer DW, 2013, WILEY SER PROBAB ST, P1, DOI 10.1002/9781118548387

[9]

Konecny J., 2016, CORR

[10]

Krizhevsky A., 2009, LEARNING MULTIPLE LA

← 1 2 3 4 →