Overcoming Client Data Deficiency in Federated Learning by Exploiting Unlabeled Data on the Server

被引:0
作者
Park, Jae-Min [1 ]
Jang, Won-Jun [1 ]
Oh, Tae-Hyun [2 ,3 ]
Lee, Si-Hyeon [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
[2] Pohang Univ Sci & Technol POSTECH, Dept Elect Engn, Pohang 37673, South Korea
[3] Pohang Univ Sci & Technol POSTECH, Grad Sch AI, Pohang 37673, South Korea
关键词
Federated Learning; knowledge distillation; ensemble distillation; self-supervised learning; uncertainty;
D O I
10.1109/ACCESS.2024.3458911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated Learning (FL) is a distributed machine learning paradigm involving multiple clients to train a server model. In practice, clients often possess limited data and are not always available for simultaneous participation in FL, which can lead to data deficiency. This data deficiency degrades the entire learning process. To address this, we propose Federated learning with entropy-weighted ensemble Distillation and Self-supervised learning (FedDS). FedDS effectively handles situations with limited data per client and few clients. The key idea is to exploit the unlabeled data available on the server in the aggregating step of client models into a server model. We distill the multiple client models to a server model in an ensemble way. To robustly weigh the quality of source pseudo-labels from the client models, we propose an entropy weighting method and show a favorable tendency that our method assigns higher weights to more accurate predictions. Furthermore, we jointly leverage a separate self-supervised loss for improving generalization of the server model. We demonstrate the effectiveness of our FedDS both empirically and theoretically. For CIFAR-10, our method shows an improvement over FedAVG of 12.54% in the data deficient regime, and of 17.16% and 23.56% in the more challenging scenarios of noisy label or Byzantine client cases, respectively. For CIFAR-100 and ImageNet-100, our method shows an improvement over FedAVG of 18.68% and 15.06% in the data deficient regime, respectively.
引用
收藏
页码:130007 / 130021
页数:15
相关论文
共 50 条
[31]   Dubhe: Towards Data Unbiasedness with Homomorphic Encryption in Federated Learning Client Selection [J].
Zhang, Shulai ;
Li, Zirui ;
Chen, Quan ;
Zheng, Wenli ;
Leng, Jingwen ;
Guo, Minyi .
50TH INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING, 2021,
[32]   Can hierarchical client clustering mitigate the data heterogeneity effect in federated learning? [J].
Lee, Seungjun ;
Yu, Miri ;
Yoon, Daegun ;
Oh, Sangyoon .
2023 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS, IPDPSW, 2023, :799-808
[33]   FedCME: Client Matching and Classifier Exchanging to Handle Data Heterogeneity in Federated Learning [J].
Nie, Jun ;
Xiao, Danyang ;
Yang, Lei ;
Wu, Weigang .
2023 19TH INTERNATIONAL CONFERENCE ON MOBILITY, SENSING AND NETWORKING, MSN 2023, 2023, :544-552
[34]   FedDSS: A data-similarity approach for client selection in horizontal federated learning [J].
Nguyen, Tuong Minh ;
Poh, Kim Leng ;
Chong, Shu-Ling ;
Lee, Jan Hau .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2024, 192
[35]   Overcoming Noisy Labels and Non-IID Data in Edge Federated Learning [J].
Xu, Yang ;
Liao, Yunming ;
Wang, Lun ;
Xu, Hongli ;
Jiang, Zhida ;
Zhang, Wuyang .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (12) :11406-11421
[36]   UFed-GAN: Secure Federated Learning over Wireless Sensor Networks with Unlabeled Data [J].
Wijesinghe, Achintha ;
Zhang, Songyang ;
Qi, Siyu ;
Ding, Zhi .
2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, :1048-1053
[37]   Boosting semi-supervised federated learning by effectively exploiting server-side knowledge and client-side unconfident samples [J].
Liu, Hongquan ;
Mi, Yuxi ;
Tang, Yateng ;
Guan, Jihong ;
Zhou, Shuigeng .
NEURAL NETWORKS, 2025, 188
[38]   CHASe: Client Heterogeneity-Aware Data Selection for Effective Federated Active Learning [J].
Zhang, Jun ;
Wang, Jue ;
Li, Huan ;
Xie, Zhongle ;
Chen, Ke ;
Shou, Lidan .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2025, 37 (06) :3088-3102
[39]   Scout:An Efficient Federated Learning Client Selection Algorithm Driven by Heterogeneous Data and Resource [J].
Zhang, Ruilin ;
Xu, Zhenan ;
Yin, Hao .
2023 IEEE INTERNATIONAL CONFERENCE ON JOINT CLOUD COMPUTING, JCC, 2023, :46-49
[40]   Pretraining Client Selection Algorithm Based on a Data Distribution Evaluation Model in Federated Learning [J].
Xu, Chang ;
Liu, Hong ;
Li, Kexin ;
Feng, Wanglei ;
Qi, Wei .
IEEE ACCESS, 2024, 12 :63958-63966