Two-Stage Clustering for Federated Learning with Pseudo Mini-batch SGD Training on Non-IID Data

被引:1
|
作者
Weng, Jianqing [1 ]
Su, Songzhi [1 ]
Fan, Xiaoliang [1 ]
机构
[1] Xiamen Univ, Xiamen 361005, Peoples R China
来源
COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2021, PT I | 2022年 / 1491卷
关键词
Federated learning; Clustering; Non-IID data;
D O I
10.1007/978-981-19-4546-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical heterogeneity problem in federated learning is mainly caused by the skewness of the data distribution among clients. In this paper, we first discover a connection between the discrepancy of data distributions and their model divergence. Based on this insight, we introduce a K-center clustering method to build client groups by the similarity of their local updating parameters, which can effectively reduce the data distribution skewness. Secondly, this paper provides a theoretical proof that a more uniform data distribution of clients in training can reduce the growth of model divergence thereby improving the training performance on Non-IID environment. Therefore, we randomly divide the clients of each cluster in the first stage into multiple fine-grained clusters to flatten the original data distribution. Finally, to fully leverage the data in each fine-grained cluster for training, we proposed an intra-cluster training method named pseudo mini-batch SGD training. This method can conduct general mini-batch SGD training on each fine-grained cluster with data kept locally. With the two-stage clustering mechanism, the negative effect of Non-IID data can be steadily eliminated. Experiments on two federated learning benchmarks i.e. FEMNIST and CelebA, as well as a manually setting Non-IID dataset using CIFAR10 show that our proposed method significantly improves training efficiency on Non-IID data and outperforms several widely-used federated baselines.
引用
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [31] Contractible Regularization for Federated Learning on Non-IID Data
    Chen, Zifan
    Wu, Zhe
    Wu, Xian
    Zhang, Li
    Zhao, Jie
    Yan, Yangtian
    Zheng, Yefeng
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 61 - 70
  • [32] Decoupled Federated Learning for ASR with Non-IID Data
    Zhu, Han
    Wang, Jindong
    Cheng, Gaofeng
    Zhang, Pengyuan
    Yan, Yonghong
    INTERSPEECH 2022, 2022, : 2628 - 2632
  • [33] Data augmentation scheme for federated learning with non-IID data
    Tang L.
    Wang D.
    Liu S.
    Tongxin Xuebao/Journal on Communications, 2023, 44 (01): : 164 - 176
  • [34] FedNSE: Optimal Node Selection for Federated Learning with Non-IID Data
    Bansal, Sourav
    Bansal, Manav
    Verma, Rohit
    Shorey, Rajeev
    Saran, Huzur
    2023 15TH INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS, COMSNETS, 2023,
  • [35] Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning
    Xu, Yang
    Liao, Yunming
    Wang, Lun
    Xu, Hongli
    Jiang, Zhida
    Zhang, Wuyang
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) : 11406 - 11421
  • [36] Exploring personalization via federated representation Learning on non-IID data
    Jing, Changxing
    Huang, Yan
    Zhuang, Yihong
    Sun, Liyan
    Xiao, Zhenlong
    Huang, Yue
    Ding, Xinghao
    NEURAL NETWORKS, 2023, 163 : 354 - 366
  • [37] Multi-Stage Federated Learning Mechanism with non-IID Data in Internet of Vehicles
    Tang, Xiaolan
    Liang, Yuting
    Chen, Wenlong
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2024, 61 (09): : 2170 - 2184
  • [38] Federated Learning Based on Diffusion Model to Cope with Non-IID Data
    Zhao, Zhuang
    Yang, Feng
    Liang, Guirong
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT IX, 2024, 14433 : 220 - 231
  • [39] FedRDS: Federated Learning on Non-IID Data via Regularization and Data Sharing
    Lv, Yankai
    Ding, Haiyan
    Wu, Hao
    Zhao, Yiji
    Zhang, Lei
    APPLIED SCIENCES-BASEL, 2023, 13 (23):
  • [40] FedClust: Optimizing Federated Learning on Non-IID Data through Weight-Driven Client Clustering
    Islam, Md Sirajul
    Javaherian, Simin
    Xu, Fei
    Yuan, Xu
    Chen, Li
    Tzeng, Nian-Feng
    2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW 2024, 2024, : 1184 - 1186