Two-Stage Clustering for Federated Learning with Pseudo Mini-batch SGD Training on Non-IID Data

被引:1
|
作者
Weng, Jianqing [1 ]
Su, Songzhi [1 ]
Fan, Xiaoliang [1 ]
机构
[1] Xiamen Univ, Xiamen 361005, Peoples R China
来源
COMPUTER SUPPORTED COOPERATIVE WORK AND SOCIAL COMPUTING, CHINESECSCW 2021, PT I | 2022年 / 1491卷
关键词
Federated learning; Clustering; Non-IID data;
D O I
10.1007/978-981-19-4546-5_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Statistical heterogeneity problem in federated learning is mainly caused by the skewness of the data distribution among clients. In this paper, we first discover a connection between the discrepancy of data distributions and their model divergence. Based on this insight, we introduce a K-center clustering method to build client groups by the similarity of their local updating parameters, which can effectively reduce the data distribution skewness. Secondly, this paper provides a theoretical proof that a more uniform data distribution of clients in training can reduce the growth of model divergence thereby improving the training performance on Non-IID environment. Therefore, we randomly divide the clients of each cluster in the first stage into multiple fine-grained clusters to flatten the original data distribution. Finally, to fully leverage the data in each fine-grained cluster for training, we proposed an intra-cluster training method named pseudo mini-batch SGD training. This method can conduct general mini-batch SGD training on each fine-grained cluster with data kept locally. With the two-stage clustering mechanism, the negative effect of Non-IID data can be steadily eliminated. Experiments on two federated learning benchmarks i.e. FEMNIST and CelebA, as well as a manually setting Non-IID dataset using CIFAR10 show that our proposed method significantly improves training efficiency on Non-IID data and outperforms several widely-used federated baselines.
引用
收藏
页码:29 / 43
页数:15
相关论文
共 50 条
  • [41] FEDERATED PAC-BAYESIAN LEARNING ON NON-IID DATA
    Zhao, Zihao
    Liu, Yang
    Ding, Wenbo
    Zhang, Xiao-Ping
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 5945 - 5949
  • [42] Inverse Distance Aggregation for Federated Learning with Non-IID Data
    Yeganeh, Yousef
    Farshad, Azade
    Navab, Nassir
    Albarqouni, Shadi
    DOMAIN ADAPTATION AND REPRESENTATION TRANSFER, AND DISTRIBUTED AND COLLABORATIVE LEARNING, DART 2020, DCL 2020, 2020, 12444 : 150 - 159
  • [43] A General Federated Learning Scheme with Blockchain on Non-IID Data
    Wu, Hao
    Zhao, Shengnan
    Zhao, Chuan
    Jing, Shan
    INFORMATION SECURITY AND CRYPTOLOGY, INSCRYPT 2023, PT I, 2024, 14526 : 126 - 140
  • [44] Is Non-IID Data a Threat in Federated Online Learning to Rank?
    Wang, Shuyi
    Zuccon, Guido
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2801 - 2813
  • [45] Data independent warmup scheme for non-IID federated learning
    Arafeh, Mohamad
    Ould-Slimane, Hakima
    Otrok, Hadi
    Mourad, Azzam
    Talhi, Chamseddine
    Damiani, Ernesto
    INFORMATION SCIENCES, 2023, 623 : 342 - 360
  • [46] FedPD: A Federated Learning Framework With Adaptivity to Non-IID Data
    Zhang, Xinwei
    Hong, Mingyi
    Dhople, Sairaj
    Yin, Wotao
    Liu, Yang
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2021, 69 (69) : 6055 - 6070
  • [47] A Comprehensive Study on Personalized Federated Learning with Non-IID Data
    Yu, Menghang
    Zheng, Zhenzhe
    Li, Qinya
    Wu, Fan
    Zheng, Jiaqi
    2022 IEEE INTL CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, BIG DATA & CLOUD COMPUTING, SUSTAINABLE COMPUTING & COMMUNICATIONS, SOCIAL COMPUTING & NETWORKING, ISPA/BDCLOUD/SOCIALCOM/SUSTAINCOM, 2022, : 40 - 49
  • [48] FedRL: Federated Learning with Non-IID Data via Review Learning
    Wang, Jinbo
    Wang, Ruijin
    Pei, Xikai
    2024 16TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND COMPUTING, ICMLC 2024, 2024, : 115 - 120
  • [49] Enhanced Federated Learning on Non-iid Data via Local Importance Sampling
    Zhu, Zheqi
    Fan, Pingyi
    Peng, Chenghui
    Letaief, Khaled B.
    2023 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS, 2023, : 104 - 109
  • [50] FedLC: Optimizing Federated Learning in Non-IID Data via Label-Wise Clustering
    Lee, Hunmin
    Seo, Daehee
    IEEE ACCESS, 2023, 11 : 42082 - 42095