Convergence of Update Aware Device Scheduling for Federated Learning at the Wireless Edge

被引:141
作者
Amiri, Mohammad Mohammadi [1 ]
Gunduz, Deniz [2 ]
Kulkarni, Sanjeev R. [1 ]
Poor, H. Vincent [1 ]
机构
[1] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
[2] Imperial Coll London, Dept Elect & Elect Engn, London SW7 2AZ, England
基金
欧洲研究理事会; 美国国家科学基金会; 英国工程与自然科学研究理事会;
关键词
Performance evaluation; Convergence; Bandwidth; Wireless networks; Propagation losses; Servers; Wireless sensor networks; Federated learning; update aware device selection; stochastic gradient descent;
D O I
10.1109/TWC.2021.3052681
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We study federated learning (FL) at the wireless edge, where power-limited devices with local datasets collaboratively train a joint model with the help of a remote parameter server (PS). We assume that the devices are connected to the PS through a bandwidth-limited shared wireless channel. At each iteration of FL, a subset of the devices are scheduled to transmit their local model updates to the PS over orthogonal channel resources, while each participating device must compress its model update to accommodate to its link capacity. We design novel scheduling and resource allocation policies that decide on the subset of the devices to transmit at each round, and how the resources should be allocated among the participating devices, not only based on their channel conditions, but also on the significance of their local model updates. We then establish convergence of a wireless FL algorithm with device scheduling, where devices have limited capacity to convey their messages. The results of numerical experiments show that the proposed scheduling policy, based on both the channel conditions and the significance of the local model updates, provides a better long-term performance than scheduling policies based only on either of the two metrics individually. Furthermore, we observe that when the data is independent and identically distributed (i.i.d.) across devices, selecting a single device at each round provides the best performance, while when the data distribution is non-i.i.d., scheduling multiple devices at each round improves the performance. This observation is verified by the convergence result, which shows that the number of scheduled devices should increase for a less diverse and more biased data distribution.
引用
收藏
页码:3643 / 3658
页数:16
相关论文
共 27 条
[1]  
Amiri Mohammad Mohammadi, 2020, 2020 IEEE International Symposium on Information Theory (ISIT), P2598, DOI 10.1109/ISIT44484.2020.9173960
[2]   Federated Learning Over Wireless Fading Channels [J].
Amiri, Mohammad Mohammadi ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (05) :3546-3557
[3]   Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air [J].
Amiri, Mohammad Mohammadi ;
Gunduz, Deniz .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 (68) :2155-2169
[4]   Collaborative Machine Learning at the Wireless Edge with Blind Transmitters [J].
Amiri, Mohammad Mohammadi ;
Duman, Tolga M. ;
Gunduz, Deniz .
2019 7TH IEEE GLOBAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING (IEEE GLOBALSIP), 2019,
[5]  
[Anonymous], 2018, CORR
[6]  
[Anonymous], 2018, ARXIV180408333
[7]  
Chang W.-T., 2020, ARXIV200108737
[8]  
Chen M., 2019, ARXIV190907972
[9]  
Dinh C. T., 2019, ARXIV191013067
[10]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121