Chronos: Accelerating Federated Learning With Resource Aware Training Volume Tuning at Network Edges

被引:6
作者
Liu, Yutao [1 ]
Zhang, Xiaoning [1 ]
Zhao, Yangming [2 ]
He, Yexiao [1 ]
Yu, Shui [3 ]
Zhu, Kainan [4 ]
机构
[1] Univ Elect Sci Technol China, Sch Informat Commun Engn, Chengdu 610056, Peoples R China
[2] Univ Sci Technol China, Sch Comp Sci Technol, Hefei 230052, Peoples R China
[3] Univ Technol Sydney, Sch Software, Sydney 2007, Australia
[4] Zhejiang Lab, Res Ctr Intelligent Transportat, Hangzhou 310058, Peoples R China
基金
中国国家自然科学基金;
关键词
Training; Convergence; Servers; Synchronization; Computational modeling; Wireless communication; Bandwidth; Distributed machine learning; federated learning; artificial intelligence; edge computing; parallel mechanism;
D O I
10.1109/TVT.2022.3218155
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Due to the limited resources and data privacy issue, last decade witnesses the fast development of Distributed Machine Learning (DML) at network edges. Among all the existing DML paradigms, Federated Learning (FL) would be a promising one, since in FL, each client trains its local model without sharing the raw data with others. A community of clients with the same interest can join together to derive a high-performance model by periodically synchronizing the parameters of their local models under the help of a coordination server. However, FL will encounter the straggler problem at network edges, and hence the synchronization among clients becomes inefficient. It slows down the convergence speed of learning process. To alleviate the straggler problem, we propose a method named Chronos that accelerates FL with training volume tuning in this paper. More specifically, Chronos is a resource aware method that adaptively adjusts the amount of data used by each client for training (i.e., training volume) in each iteration in order to eliminate the synchronization waiting time caused by the heterogeneous and dynamical computing and communication resources. In addition, we theoretically analyze the convergence of Chronos in a non-convex setting and utilize the results for the algorithm design of Chronos in return to guarantee the convergence. Extensive experiments show that compared with the benchmark algorithms (i.e., BSP and SSP), Chronos significantly improves convergence speed by up to 6.4x.
引用
收藏
页码:3889 / 3903
页数:15
相关论文
共 36 条
[1]   A Survey on Federated Learning: The Journey From Centralized to Distributed On-Site Learning and Beyond [J].
AbdulRahman, Sawsan ;
Tout, Hanine ;
Ould-Slimane, Hakima ;
Mourad, Azzam ;
Talhi, Chamseddine ;
Guizani, Mohsen .
IEEE INTERNET OF THINGS JOURNAL, 2021, 8 (07) :5476-5497
[2]  
Anavid, 2022, ABOUT US
[3]  
[Anonymous], 2014, 11 USENIX S OPERATIN
[4]   Optimization Methods for Large-Scale Machine Learning [J].
Bottou, Leon ;
Curtis, Frank E. ;
Nocedal, Jorge .
SIAM REVIEW, 2018, 60 (02) :223-311
[5]   A Joint Learning and Communications Framework for Federated Learning Over Wireless Networks [J].
Chen, Mingzhe ;
Yang, Zhaohui ;
Saad, Walid ;
Yin, Changchuan ;
Poor, H. Vincent ;
Cui, Shuguang .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (01) :269-283
[6]  
Dean J., 2012, Advances in neural information processing systems, V25
[7]   Dermatologist-level classification of skin cancer with deep neural networks [J].
Esteva, Andre ;
Kuprel, Brett ;
Novoa, Roberto A. ;
Ko, Justin ;
Swetter, Susan M. ;
Blau, Helen M. ;
Thrun, Sebastian .
NATURE, 2017, 542 (7639) :115-+
[8]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[9]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97
[10]  
Ho Qirong, 2013, Adv Neural Inf Process Syst, V2013, P1223