Learning from class-imbalanced data using misclassification-focusing generative adversarial networks

被引:9
作者
Yun, Jaesub [1 ]
Lee, Jong-Seok [1 ]
机构
[1] Sungkyunkwan Univ, Dept Ind Engn, Suwon 16419, South Korea
关键词
Class imbalance; Oversampling; Generative adversarial networks; End-to-end learning; Deep learning; CLASSIFICATION; SMOTE;
D O I
10.1016/j.eswa.2023.122288
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a novel end-to-end oversampling-classification approach, which we refer to as imbalanced data-classifying generative adversarial network (ImbGAN), for imbalanced data classification. ImbGAN has a classifier-embedded structure within a GAN and consists of five components: (1) generator, (2) discriminator, (3) classifier, (4) storage for misclassified minority class data, and (5) storage for artificial minority class data. By iterative interaction with the embedded classifier, the first two components generate artificial minority class instances that are similar to minority class instances misclassified by the classifier. Therefore, these three networks are iteratively and simultaneously trained. The misclassified and artificial minority class instances are stored in the fourth and fifth components, respectively. These two components are also updated as iterations proceed. Our method obtains the final classification model from a single learning process, while most artificial data generation methods for imbalanced data classification go through an additional process for training classifiers after artificial data generation. Numerical experiments based on tabular, image, and text datasets confirm that the proposed method outperforms well-known synthetic sampling methods.
引用
收藏
页数:17
相关论文
共 56 条
[1]  
Aggarwal CC, 2001, LECT NOTES COMPUT SC, V1973, P420
[2]   MFC-GAN: Class-imbalanced dataset classification using Multiple Fake Class Generative Adversarial Network [J].
Ali-Gombe, Adamu ;
Elyan, Eyad .
NEUROCOMPUTING, 2019, 361 :212-221
[3]  
Baur C, 2018, Arxiv, DOI [arXiv:1804.04338, DOI 10.48550/ARXIV.1804.04338]
[4]   Evaluation of SMOTE for high-dimensional class-imbalanced microarray data [J].
Blagus, Rok ;
Lusa, Lara .
2012 11TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA 2012), VOL 2, 2012, :89-94
[5]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[6]   Support vector machines for candidate nodules classification [J].
Campadelli, P ;
Casiraghi, E ;
Valentini, G .
NEUROCOMPUTING, 2005, 68 :281-288
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   Human and DNN Classification Performance on Images With Quality Distortions: A Comparative Study [J].
Dodge, Samuel ;
Karam, Lina .
ACM TRANSACTIONS ON APPLIED PERCEPTION, 2019, 16 (02)
[9]   Effective data generation for imbalanced learning using conditional generative adversarial networks [J].
Douzas, Georgios ;
Bacao, Fernando .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 91 :464-471
[10]  
Dua Dheeru, 2017, UCI machine learning repository