On oversampling imbalanced data with deep conditional generative models

被引:37
作者
Fajardo, Val Andrei [1 ]
Findlay, David [1 ]
Jaiswal, Charu [1 ]
Yin, Xinshang [1 ]
Houmanfar, Roshanak [1 ]
Xie, Honglei [1 ]
Liang, Jiaxi [1 ]
She, Xichen [1 ]
Emerson, D. B. [1 ]
机构
[1] Integrate Ai, 480 Univ Ave, Toronto, ON, Canada
关键词
Deep generative models; Conditional variational autoencoders; Class imbalance; Oversampling;
D O I
10.1016/j.eswa.2020.114463
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Class imbalanced datasets are common in real-world applications ranging from credit card fraud detection to rare disease diagnosis. Recently, deep generative models have proved successful for an array of machine learning problems such as semi-supervised learning, transfer learning, and recommender systems. However their application to class imbalance situations is limited. In this paper, we consider class conditional variants of generative adversarial networks and variational autoencoders and apply them to the imbalance problem. The main question we seek to answer is whether or not deep conditional generative models can effectively learn the distributions of minority classes so as to produce synthetic observations that ultimately lead to improvements in the performance of a downstream classifier. The numerical results show that this is indeed true and that deep generative models outperform traditional oversampling methods in many circumstances, especially in cases of severe imbalance.
引用
收藏
页数:12
相关论文
共 33 条
[1]  
[Anonymous], 2019, INT C LEARN REPR
[2]   A systematic study of the class imbalance problem in convolutional neural networks [J].
Buda, Mateusz ;
Maki, Atsuto ;
Mazurowski, Maciej A. .
NEURAL NETWORKS, 2018, 106 :249-259
[3]  
Burda Yuri, 2016, 4 INT C LEARN REPR
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   Effective data generation for imbalanced learning using conditional generative adversarial networks [J].
Douzas, Georgios ;
Bacao, Fernando .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 :464-471
[6]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[7]   Learning from class-imbalanced data: Review of methods and applications [J].
Guo Haixiang ;
Li Yijing ;
Shang, Jennifer ;
Gu Mingyun ;
Huang Yuanyue ;
Bing, Gong .
EXPERT SYSTEMS WITH APPLICATIONS, 2017, 73 :220-239
[8]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284
[9]  
Heusel M., 2017, Advances in Neural Information Processing Systems (NeurIPS), P6629
[10]   Image-to-Image Translation with Conditional Adversarial Networks [J].
Isola, Phillip ;
Zhu, Jun-Yan ;
Zhou, Tinghui ;
Efros, Alexei A. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976