Federated Visual Classification with Real-World Data Distribution

被引:95
作者
Hsu, Tzu-Ming Harry [1 ,2 ]
Qi, Hang [2 ]
Brown, Matthew [2 ]
机构
[1] MIT, CSAIL, Cambridge, MA 02139 USA
[2] Google Res, Seattle, WA 98103 USA
来源
COMPUTER VISION - ECCV 2020, PT X | 2020年 / 12355卷
关键词
D O I
10.1007/978-3-030-58607-2_5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Federated Learning enables visual models to be trained on-device, bringing advantages for user privacy (data need never leave the device), but challenges in terms of data diversity and quality. Whilst typical models in the datacenter are trained using data that are independent and identically distributed (IID), data at source are typically far from IID. Furthermore, differing quantities of data are typically available at each device (imbalance). In this work, we characterize the effect these real-world data distributions have on distributed learning, using as a benchmark the standard Federated Averaging (FedAvg) algorithm. To do so, we introduce two new large-scale datasets for species and landmark classification, with realistic per-user data splits that simulate real-world edge learning scenarios. We also develop two new algorithms (FedVC, FedIR) that intelligently resample and reweight over the client pool, bringing large improvements in accuracy and stability in training. The datasets are made available online.
引用
收藏
页码:76 / 92
页数:17
相关论文
共 34 条
[1]  
Bonawitz K, 2019, Proc. Mach. Learn. Syst
[2]  
Caldas Sebastian, 2018, arXiv
[3]  
Cohen G, 2017, IEEE IJCNN, P2921, DOI 10.1109/IJCNN.2017.7966217
[4]  
Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5]   What Makes Paris Look Like Paris? [J].
Doersch, Carl ;
Singh, Saurabh ;
Gupta, Abhinav ;
Sivic, Josef ;
Efros, Alexei A. .
COMMUNICATIONS OF THE ACM, 2015, 58 (12) :103-110
[6]  
Google, 2019, TensorFlow Federated Datasets
[7]  
Hsu TMH, 2019, Arxiv, DOI arXiv:1909.06335
[8]  
Hays J, 2008, PROC CVPR IEEE, P3436
[9]   WEIGHTED AVERAGE IMPORTANCE SAMPLING AND DEFENSIVE MIXTURE DISTRIBUTIONS [J].
HESTERBERG, T .
TECHNOMETRICS, 1995, 37 (02) :185-194
[10]  
Hsieh KV, 2020, Arxiv, DOI arXiv:1910.00189