Speed Up Federated Learning in Heterogeneous Environments: A Dynamic Tiering Approach

被引:4
作者
Mahmoud Sajjadi Mohammadabadi, Seyed [1 ]
Zawad, Syed [2 ]
Yan, Feng [3 ,4 ]
Yang, Lei [1 ]
机构
[1] Univ Nevada Reno, Dept Comp Sci & Engn, Reno, NV 89557 USA
[2] IBM Res Almaden, Res Dept, San Jose, CA 95120 USA
[3] Univ Houston, Comp Sci Dept, Houston, TX 77004 USA
[4] Univ Houston, Elect & Comp Engn Dept, Houston, TX 77004 USA
基金
美国国家科学基金会;
关键词
Training; Computational modeling; Internet of Things; Servers; Federated learning; Accuracy; Dynamic scheduling; Performance evaluation; Optimization; Load modeling; Distributed optimization; edge computing; federated learning (FL); heterogeneous devices; split learning (SL);
D O I
10.1109/JIOT.2024.3487473
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) enables collaborative training of a model while keeping the training data decentralized and private. However, in Internet of Things systems, inherent heterogeneity in processing power, communication bandwidth, and task size can significantly hinder the efficient training of large models. Such heterogeneity would render vast variations in the training time of clients, lengthening overall training and wasting resources of faster clients. To tackle these heterogeneity challenges, we propose dynamic tiering-based FL (DTFL), a novel system that leverages distributed optimization principles to improve the edge learning performance. Based on clients' resources, DTFL dynamically offloads part of the global model to the server, alleviating resource constraints on slower clients and speeding up training. By leveraging split learning, DTFL offloads different portions of the global model to clients in different tiers and enables each client to update the models in parallel via local-loss-based training. This helps reduce the computation and communication demand on resource-constrained devices, mitigating the straggler problem. DTFL introduces a dynamic tier scheduler that uses tier profiling to estimate the expected training time of each client based on their historical training time, communication speed, and dataset size. The dynamic tier scheduler assigns clients to suitable tiers to minimize the overall training time in each round. We theoretically prove the convergence properties of DTFL and validate its effectiveness by training large models (ResNet-56 and ResNet-110) across varying numbers of clients (from 10 to 200) using the popular image datasets (CIFAR-10, CIFAR-100, CINIC-10, and HAM10000) under both I.I.D and non-I.I.D systems. DTFL seamlessly integrates various privacy measures without sacrificing performance. Extensive experimental results show that compared with state-of-the-art FL methods, DTFL can significantly reduce the training time by up to 80% while maintaining the model accuracy.
引用
收藏
页码:5026 / 5035
页数:10
相关论文
共 52 条
[1]   Deep Learning with Differential Privacy [J].
Abadi, Martin ;
Chu, Andy ;
Goodfellow, Ian ;
McMahan, H. Brendan ;
Mironov, Ilya ;
Talwar, Kunal ;
Zhang, Li .
CCS'16: PROCEEDINGS OF THE 2016 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2016, :308-318
[2]   A Comprehensive Empirical Study of Heterogeneity in Federated Learning [J].
Abdelmoniem, Ahmed M. M. ;
Ho, Chen-Yu ;
Papageorgiou, Pantelis ;
Canini, Marco .
IEEE INTERNET OF THINGS JOURNAL, 2023, 10 (16) :14071-14083
[3]  
Belilovsky Eugene, 2020, P MACHINE LEARNING R, V119
[4]  
Bonawitz K., 2019, Proc. Mach. Learn. Res, V1, P374
[5]  
Caldas S., 2018, arXiv
[6]   TiFL: A Tier-based Federated Learning System [J].
Chai, Zheng ;
Ali, Ahsan ;
Zawad, Syed ;
Treux, Stacey ;
Anwar, Ali ;
Barcaldo, Nathalie ;
Zhou, Yi ;
Ludwig, Heiko ;
Yan, Feng ;
Cheng, Yue .
PROCEEDINGS OF THE 29TH INTERNATIONAL SYMPOSIUM ON HIGH-PERFORMANCE PARALLEL AND DISTRIBUTED COMPUTING, HPDC 2020, 2020, :125-136
[7]   FedAT: A High-Performance and Communication -Efficient Federated Learning System with Asynchronous Tiers [J].
Chai, Zheng ;
Chen, Yujing ;
Anwar, Ali ;
Zhao, Liang ;
Cheng, Yue ;
Rangwala, Huzefa .
SC21: INTERNATIONAL CONFERENCE FOR HIGH PERFORMANCE COMPUTING, NETWORKING, STORAGE AND ANALYSIS, 2021,
[8]   Communication-Efficient and Model-Heterogeneous Personalized Federated Learning via Clustered Knowledge Transfer [J].
Cho, Yae Jee ;
Wang, Jianyu ;
Chirvolu, Tarun ;
Joshi, Gauri .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2023, 17 (01) :234-247
[9]  
Darlow L N, 2018, arXiv
[10]  
Devlin J, 2019, Arxiv, DOI arXiv:1810.04805