Adaptive Client Clustering for Efficient Federated Learning Over Non-IID and Imbalanced Data

被引:27
作者
Gong, Biyao [1 ]
Xing, Tianzhang [1 ,2 ]
Liu, Zhidan [3 ]
Xi, Wei [4 ]
Chen, Xiaojiang [1 ,2 ]
机构
[1] Northwest Univ, Sch Informat Sci & Technol, Xian 710069, Shaanxi, Peoples R China
[2] Norhwest Univ, Internet Things Res Ctr, Xian 710069, Shaanxi, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Guangdong, Peoples R China
[4] Xi An Jiao Tong Univ, Sch Comp Sci & Technol, Xian, Shaanxi, Peoples R China
关键词
Federated learning; clustered federated learning; non-IID data; imbalanced data; client clustering; weighted voting;
D O I
10.1109/TBDATA.2022.3167994
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) is an emerging distributed and privacy-preserving machine learning framework. However, the performance of traditional FL methods is seriously impaired by the real-world data, which appear to be non-independent and identically distributed (non-IID). The recent clustered federated learning (CFL) methods eliminate the impact of non-IID data by grouping clients with similar data distribution into the same cluster. Unfortunately, existing CFL methods heavily rely on the pre-setting of the cluster number, failing to achieve adaptive client clustering. Even worse, we experimentally observe that imbalanced data across clients largely degrade their correctness of client clustering. In this paper, we present a novel CFL method without manual intervention, named AutoCFL, which can eliminate both effects of non-IID and imbalanced data simultaneously. To deal with imbalanced data, the local training adjustment strategy adaptively adjusts the number of local training epochs for each client. To further improve the clustering correctness and adaptability, the weighted voting-based client clustering strategy automatically groups each client into an appropriate cluster. Extensive experiments are conducted to evaluate the design of AutoCFL with three popular datasets under various data settings. Experimental results demonstrate that AutoCFL outperforms the state-of-the-art methods under non-IID and imbalanced data settings, e.g., on average improving the model accuracy by 9.24% when compared to the standard FL method, i.e., FedAvg, while significantly reducing communication costs by 4.67x in an adaptive client clustering manner.
引用
收藏
页码:1051 / 1065
页数:15
相关论文
共 45 条
[1]   Communication-efficient hierarchical federated learning for IoT heterogeneous systems with imbalanced data [J].
Abdellatif, Alaa Awad ;
Mhaisen, Naram ;
Mohamed, Amr ;
Erbad, Aiman ;
Guizani, Mohsen ;
Dawy, Zaher ;
Nasreddine, Wassim .
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2022, 128 :406-419
[2]   ST-DBSCAN: An algorithm for clustering spatial-temp oral data [J].
Birant, Derya ;
Kut, Alp .
DATA & KNOWLEDGE ENGINEERING, 2007, 60 (01) :208-221
[3]  
Bonawitz K., 2019, P MACH LEARN SYST, DOI 10.48550/arXiv.1902.01046
[4]   Federated learning with hierarchical clustering of local updates to improve training on non-IID data [J].
Briggs, Christopher ;
Fan, Zhong ;
Andras, Peter .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[5]  
Duan MM, 2021, Arxiv, DOI arXiv:2010.06870
[6]   An Efficient Framework for Clustered Federated Learning [J].
Ghosh, Avishek ;
Chung, Jichan ;
Yin, Dong ;
Ramchandran, Kannan .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (12) :8076-8091
[7]  
Hartigan J. A., 1979, Applied Statistics, V28, P100, DOI 10.2307/2346830
[8]  
Cho YJ, 2020, Arxiv, DOI [arXiv:2010.01243, DOI 10.48550/ARXIV.2010.01243]
[9]   A GDPR-compliant Ecosystem for Speech Recognition with Transfer, Federated, and Evolutionary Learning [J].
Jiang, Di ;
Tan, Conghui ;
Peng, Jinhua ;
Chen, Chaotao ;
Wu, Xueyang ;
Zhao, Weiwei ;
Song, Yuanfeng ;
Tong, Yongxin ;
Liu, Chang ;
Xu, Qian ;
Yang, Qiang ;
Deng, Li .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2021, 12 (03)
[10]  
Jiang YH, 2023, Arxiv, DOI arXiv:1909.12488