An End-Cloud Collaborative Federated Learning Debugging Framework for Data Heterogeneity

被引:0
作者
Kong, Chao [1 ]
Meng, Dan [2 ]
Fu, Zhihui [2 ]
Pei, Ruiguang [2 ]
Wu, Junjie [2 ]
Zhu, Haibei [3 ]
Zhan, Tong [4 ]
机构
[1] Anhui Polytech Univ, Sch Comp & Informat, Wuhu 241000, Peoples R China
[2] Chinese Acad Sci, Guiyang 518000, Peoples R China
[3] Georgia Inst Technol, Sch Elect & Comp Engn, Atlanta, GA 30332 USA
[4] Salesforce TMP Org, San Francisco, CA 94105 USA
基金
中国国家自然科学基金;
关键词
Federated learning; abnormal device detection; end-cloud collaboration; data heterogeneity; data security and privacy;
D O I
10.1142/S0218126625501117
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
End-cloud collaborative computing framework ensures the security and privacy of edge device data, enabling collaborative training of global models without direct data exchange. However, in practical scenarios, anomalies in training or edge device data may severely degrade or disable the global model's performance. Existing frameworks lack effective debugging and anomaly localization, hindering real-time monitoring and precise identification of abnormal edge devices in data heterogeneity scenarios. In this paper, we propose a new method named FedCheck, a debugging framework for end-cloud collaborative federated learning that enables real-time alerts and detects abnormal devices for nonindependent and identically distributed (nonIID) data without disrupting the regular training process. Specifically, we employ a model similarity-based method to quantitatively assess the degree of device anomaly in data heterogeneity scenarios, supporting real-time alerts during the end-cloud collaboration process. Furthermore, a simulation program replays the training process based on recorded telemetry data, facilitating backtracking debugging of any training round and the status of edge devices. Finally, the framework removes abnormal devices and repairs the global model. Experiments on MNIST and Fashion-MNIST datasets demonstrate that FedCheck can effectively detect and locate abnormal devices in data heterogeneity scenarios. Even in large-scale federated learning, it maintains high detection performance and exhibits good scalability.
引用
收藏
页数:26
相关论文
共 36 条
[1]  
Arazo E, 2019, PR MACH LEARN RES, V97
[2]   Trusted Federated Secure Aggregation via Similarity Clustering [J].
Cai, Hongyu ;
Zhang, Yu ;
Wang, Shiyun ;
Zhao, Ao ;
Zhang, Meiling .
JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2023, 45 (03) :894-904
[3]   FLTrust: Byzantine-robust Federated Learning via Trust Bootstrapping [J].
Cao, Xiaoyu ;
Fang, Minghong ;
Liu, Jia ;
Gong, Neil Zhenqiang .
28TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2021), 2021,
[4]   A Comparative Study of Deep Neural Network-Aided Canonical Correlation Analysis-Based Process Monitoring and Fault Detection Methods [J].
Chen, Zhiwen ;
Liang, Ketian ;
Ding, Steven X. ;
Yang, Chao ;
Peng, Tao ;
Yuan, Xiaofeng .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (11) :6158-6172
[5]   Client Scheduling and Resource Management for Efficient Training in Heterogeneous IoT-Edge Federated Learning [J].
Cui, Yangguan ;
Cao, Kun ;
Cao, Guitao ;
Qiu, Meikang ;
Wei, Tongquan .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2022, 41 (08) :2407-2420
[6]   Optimizing Training Efficiency and Cost of Hierarchical Federated Learning in Heterogeneous Mobile-Edge Cloud Computing [J].
Cui, Yangguang ;
Cao, Kun ;
Zhou, Junlong ;
Wei, Tongquan .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2023, 42 (05) :1518-1531
[7]   Addressing modern and practical challenges in machine learning: a survey of online federated and transfer learning [J].
Dai, Shuang ;
Meng, Fanlin .
APPLIED INTELLIGENCE, 2023, 53 (09) :11045-11072
[8]   Federated Learning for Electronic Health Records [J].
Dang, Trung Kien ;
Lan, Xiang ;
Weng, Jianshu ;
Feng, Mengling .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2022, 13 (05)
[9]  
Duan S., 2023, Secur. Commun. Netw, V2023, P1
[10]  
Fan XL, 2024, AAAI CONF ARTIF INTE, P11919