Approaches to address the Data Skew Problem in Federated Learning

被引:19
作者
Verma, Darshika C. [1 ]
White, Graham [2 ]
Julier, Simon [3 ]
Pasteris, Stepehen [3 ]
Chakraborty, Supriyo [1 ]
Cirincione, Greg [4 ]
机构
[1] IBM Corp, TJ Watson Res Ctr, POB 218, Yorktown Hts, NY 10598 USA
[2] IBM Res, Hursley Pk, Hursley SO21 2JN, England
[3] UCL, 66 Gower St, London WC1E 6BT, England
[4] US Army Res Lab, 2800 Powder Mill Rd, Adelphi, MD USA
来源
ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING FOR MULTI-DOMAIN OPERATIONS APPLICATIONS | 2019年 / 11006卷
关键词
Federated Learning; AI; Data Skew; Coalition Operations; Future Battlespace; SMOTE;
D O I
10.1117/12.2519621
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A Federated Learning approach consists of creating an AI model from multiple data sources, without moving large amounts of data across to a central environment. Federated learning can be very useful in a tactical coalition environment, where data can be collected individually by each of the coalition partners, but network connectivity is inadequate to move the data to a central environment. However, such data collected is often dirty and imperfect. The data can be imbalanced, and in some cases, some classes can be completely missing from some coalition partners. Under these conditions, traditional approaches for federated learning can result in models that are highly inaccurate. In this paper, we propose approaches that can result in good machine learning models even in the environments where the data may be highly skewed, and study their performance under different environments.
引用
收藏
页数:16
相关论文
共 13 条
[1]  
[Anonymous], 2018, ARXIV180900343
[2]  
Bau D., 2019, ICLR
[3]   Federated learning of predictive models from federated Electronic Health Records [J].
Brisimi, Theodora S. ;
Chen, Ruidi ;
Mela, Theofanie ;
Olshevsky, Alex ;
Paschalidis, Ioannis Ch. ;
Shi, Wei .
INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2018, 112 :59-67
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   Calibrating Probability with Undersampling for Unbalanced Classification [J].
Dal Pozzolo, Andrea ;
Caelen, Olivier ;
Johnson, Reid A. ;
Bontempi, Gianluca .
2015 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2015, :159-166
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[8]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[9]  
PARK YT, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P94, DOI 10.1109/ICNN.1994.374145
[10]   On mapping decision trees and neural networks [J].
Setiono, R ;
Leow, WK .
KNOWLEDGE-BASED SYSTEMS, 1999, 12 (03) :95-99