Overcoming Client Data Deficiency in Federated Learning by Exploiting Unlabeled Data on the Server

被引:0
作者
Park, Jae-Min [1 ]
Jang, Won-Jun [1 ]
Oh, Tae-Hyun [2 ,3 ]
Lee, Si-Hyeon [1 ]
机构
[1] Korea Adv Inst Sci & Technol KAIST, Sch Elect Engn, Daejeon 34141, South Korea
[2] Pohang Univ Sci & Technol POSTECH, Dept Elect Engn, Pohang 37673, South Korea
[3] Pohang Univ Sci & Technol POSTECH, Grad Sch AI, Pohang 37673, South Korea
关键词
Federated Learning; knowledge distillation; ensemble distillation; self-supervised learning; uncertainty;
D O I
10.1109/ACCESS.2024.3458911
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Federated Learning (FL) is a distributed machine learning paradigm involving multiple clients to train a server model. In practice, clients often possess limited data and are not always available for simultaneous participation in FL, which can lead to data deficiency. This data deficiency degrades the entire learning process. To address this, we propose Federated learning with entropy-weighted ensemble Distillation and Self-supervised learning (FedDS). FedDS effectively handles situations with limited data per client and few clients. The key idea is to exploit the unlabeled data available on the server in the aggregating step of client models into a server model. We distill the multiple client models to a server model in an ensemble way. To robustly weigh the quality of source pseudo-labels from the client models, we propose an entropy weighting method and show a favorable tendency that our method assigns higher weights to more accurate predictions. Furthermore, we jointly leverage a separate self-supervised loss for improving generalization of the server model. We demonstrate the effectiveness of our FedDS both empirically and theoretically. For CIFAR-10, our method shows an improvement over FedAVG of 12.54% in the data deficient regime, and of 17.16% and 23.56% in the more challenging scenarios of noisy label or Byzantine client cases, respectively. For CIFAR-100 and ImageNet-100, our method shows an improvement over FedAVG of 18.68% and 15.06% in the data deficient regime, respectively.
引用
收藏
页码:130007 / 130021
页数:15
相关论文
共 50 条
[41]   Client Selection for Federated Learning With Non-IID Data in Mobile Edge Computing [J].
Zhang, Wenyu ;
Wang, Xiumin ;
Zhou, Pan ;
Wu, Weiwei ;
Zhang, Xinglin .
IEEE ACCESS, 2021, 9 :24462-24474
[42]   Stabilizing and improving federated learning with highly non-iid data and client dropout [J].
Xu, Jian ;
Yang, Meilin ;
Ding, Wenbo ;
Huang, Shao-Lun .
APPLIED INTELLIGENCE, 2025, 55 (03)
[43]   TiFLCS-MARP: Client selection and model pricing for federated learning in data markets [J].
Sun, Yongjiao ;
Li, Boyang ;
Yang, Kai ;
Bi, Xin ;
Zhao, Xiangning .
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[44]   Adaptive Client-Dropping in Federated Learning: Preserving Data Integrity in Medical Domains [J].
Negrao, Arthur ;
Silva, Guilherme ;
Pedrosa, Rodrigo ;
Luz, Eduardo ;
Silva, Pedro .
INTELLIGENT SYSTEMS, BRACIS 2024, PT I, 2025, 15412 :111-126
[45]   CCSF: Clustered Client Selection Framework for Federated Learning in non-IID Data [J].
Mohamed, Aissa H. ;
de Souza, Allan M. ;
da Costa, Joahannes B. D. ;
Villas, Leandro A. ;
Dos Reis, Julio C. .
16TH IEEE/ACM INTERNATIONAL CONFERENCE ON UTILITY AND CLOUD COMPUTING, UCC 2023, 2023,
[46]   Data-Centric Client Selection for Federated Learning Over Distributed Edge Networks [J].
Saha, Rituparna ;
Misra, Sudip ;
Chakraborty, Aishwariya ;
Chatterjee, Chandranath ;
Deb, Pallav Kumar .
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2023, 34 (02) :675-686
[47]   SAFELOC: Overcoming Data Poisoning Attacks in Heterogeneous Federated Machine Learning for Indoor Localization [J].
Singampalli, Akhil ;
Gufran, Danish ;
Pasricha, Sudeep .
2025 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE, DATE, 2025,
[48]   CGKDFL: A Federated Learning Approach Based on Client Clustering and Generator-Based Knowledge Distillation for Heterogeneous Data [J].
Zhang, Sanfeng ;
Xu, Hongzhen ;
Yu, Xiaojun .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2025, 37 (9-11)
[49]   Exploiting Self-Supervised and Semi-Supervised Learning for Facial Landmark Tracking with Unlabeled Data [J].
Yin, Shi ;
Wang, Shangfei ;
Chen, Xiaoping ;
Chen, Enhong .
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, :2991-2998
[50]   An EMD-Based Adaptive Client Selection Algorithm for Federated Learning in Heterogeneous Data Scenarios [J].
Chen, Aiguo ;
Fu, Yang ;
Sha, Zexin ;
Lu, Guoming .
FRONTIERS IN PLANT SCIENCE, 2022, 13