Sample-level Data Selection for Federated Learning

被引:76
作者
Li, Anran [1 ]
Zhang, Lan [1 ]
Tan, Juntao [1 ]
Qin, Yaxuan [1 ]
Wang, Junhao [1 ]
Li, Xiang-Yang [1 ]
机构
[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei, Peoples R China
来源
IEEE CONFERENCE ON COMPUTER COMMUNICATIONS (IEEE INFOCOM 2021) | 2021年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
10.1109/INFOCOM42981.2021.9488723
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Federated learning (FL) enables participants to collaboratively construct a global machine learning model without sharing their local training data to the remote server. In FL systems, the selection of training samples has a significant impact on model performances, e.g., selecting participants whose datasets have erroneous samples, skewed categorical distributions, and low content diversity would result in low accuracy and unstable models. In this work, we aim to solve the exigent optimization problem that selects a collection of high-quality training samples for a given FL task under a monetary budget in a privacy-preserving way, which is extremely challenging without visibility to participants' local data and training process. We provide a systematic analysis of important data related factors affecting the model performance and propose a holistic design to privately and efficiently select high-quality data samples considering all these factors. We verify the merits of our proposed solution with extensive experiments on a real AIoT system with 50 clients, including 20 edge computers, 20 laptops, and 10 desktops. The experimental results validates that our solution achieves accurate and efficient selection of high-quality data samples, and consequently an FL model with a faster convergence speed and higher accuracy than that achieved by existing solutions.
引用
收藏
页数:10
相关论文
共 34 条
[1]  
Agrawal R, 2003, P 2003 ACM SIGMOD IN, P86, DOI DOI 10.1145/872757.872771
[2]  
Alain Guillaume, 2015, Variance reduction in sgd by distributed importance sampling
[3]   Privacy Preserving Approximate K-means Clustering [J].
Biswas, Chandan ;
Ganguly, Debasis ;
Roy, Dwaipayan ;
Bhattacharya, Ujjwal .
PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM '19), 2019, :1321-1330
[4]  
Bonawitz K., 2019, Proc Mach Learn Syst, V1, P374
[5]  
Byrd J, 2019, INT C MACHINE LEARNI, P872, DOI 10.48550/arXiv.1812.03372
[6]  
Castellano G, 2019, IEEE INFOCOM SER, P2548, DOI [10.1109/INFOCOM.2019.8737532, 10.1109/infocom.2019.8737532]
[7]   RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response [J].
Erlingsson, Ulfar ;
Pihur, Vasyl ;
Korolova, Aleksandra .
CCS'14: PROCEEDINGS OF THE 21ST ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2014, :1054-1067
[8]  
Hard Andrew, 2018, P 2018 C EMPIRICAL M
[9]   Deep Residual Learning for Image Recognition [J].
He, Kaiming ;
Zhang, Xiangyu ;
Ren, Shaoqing ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778
[10]   Robust 1-Bit Compressive Sensing via Binary Stable Embeddings of Sparse Vectors [J].
Jacques, Laurent ;
Laska, Jason N. ;
Boufounos, Petros T. ;
Baraniuk, Richard G. .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (04) :2082-2102