Over-the-Air Federated Learning From Heterogeneous Data

被引:143
作者
Sery, Tomer [1 ]
Shlezinger, Nir [1 ]
Cohen, Kobi [1 ]
Eldar, Yonina C. [2 ]
机构
[1] Ben Gurion Univ Negev, Sch Elect & Comp Engn, IL-4486200 Beer Sheva, Israel
[2] Weizmann Inst Sci, Math & CS Fac, IL-761001 Rehovot, Israel
基金
以色列科学基金会;
关键词
Computational modeling; Servers; Convergence; Data models; Analytical models; Training; Uplink; Machine learning; optimization; gradient methods; wireless communication; MULTIPLE-ACCESS; NOISE;
D O I
10.1109/TSP.2021.3090323
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We focus on over-the-air (OTA) Federated Learning (FL), which has been suggested recently to reduce the communication overhead of FL due to the repeated transmissions of the model updates by a large number of users over the wireless channel. In OTA FL, all users simultaneously transmit their updates as analog signals over a multiple access channel, and the server receives a superposition of the analog transmitted signals. However, this approach results in the channel noise directly affecting the optimization procedure, which may degrade the accuracy of the trained model. We develop a Convergent OTA FL (COTAF) algorithm which enhances the common local stochastic gradient descent (SGD) FL algorithm, introducing precoding at the users and scaling at the server, which gradually mitigates the effect of noise. We analyze the convergence of COTAF to the loss minimizing model and quantify the effect of a statistically heterogeneous setup, i.e. when the training data of each user obeys a different distribution. Our analysis reveals the ability of COTAF to achieve a convergence rate similar to that achievable over error-free channels. Our simulations demonstrate the improved convergence of COTAF over vanilla OTA local SGD for training using non-synthetic datasets. Furthermore, we numerically show that the precoding induced by COTAF notably improves the convergence rate and the accuracy of models trained via OTA FL.
引用
收藏
页码:3796 / 3811
页数:16
相关论文
共 42 条
[1]  
Abari O., 2016, ARXIV PREPRINT ARXIV
[2]  
Aji A., 2017, P 2017 C EMP METH NA, P440, DOI DOI 10.18653/V1/D17-1045
[3]  
Alistarh D, 2018, ADV NEUR IN, V31
[4]  
Alistarh D, 2017, ADV NEUR IN, V30
[5]   Federated Learning Over Wireless Fading Channels [J].
Amiri, Mohammad Mohammadi ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (05) :3546-3557
[6]   Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air [J].
Amiri, Mohammad Mohammadi ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) :2155-2169
[7]   The effects of adding noise during backpropagation training on a generalization performance [J].
An, GZ .
NEURAL COMPUTATION, 1996, 8 (03) :643-674
[8]  
Bertin-Mahieux T., 2011, P ISMIR
[9]   Online Learning of Noisy Data [J].
Cesa-Bianchi, Nicolo ;
Shalev-Shwartz, Shai ;
Shamir, Ohad .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (12) :7907-7931
[10]  
Chang W.-T., 2020, ARXIV200108737