RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification

被引:59
作者
Ding, Hongwei [1 ,2 ]
Sun, Yu [3 ]
Wang, Zhenyu [1 ,2 ]
Huang, Nana [1 ]
Shen, Zhidong [1 ]
Cui, Xiaohui [1 ,2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Wuhan, Peoples R China
[2] Jiaxing Inst Future Food, Jiaxing, Peoples R China
[3] Natl Univ Singapore, Dept Comp Sci, Singapore, Singapore
基金
国家重点研发计划;
关键词
Imbalanced data; Generative adversarial networks; Data sampling; Ensemble learning; CLASSIFIERS; SMOTE;
D O I
10.1016/j.ipm.2022.103235
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced sample distribution is usually the main reason for the performance degradation of machine learning algorithms. Based on this, this study proposes a hybrid framework (RGAN-EL) combining generative adversarial networks and ensemble learning method to improve the classification performance of imbalanced data. Firstly, we propose a training sample selection strategy based on roulette wheel selection method to make GAN pay more attention to the class overlapping area when fitting the sample distribution. Secondly, we design two kinds of generator training loss, and propose a noise sample filtering method to improve the quality of generated samples. Then, minority class samples are oversampled using the improved RGAN to obtain a balanced training sample set. Finally, combined with the ensemble learning strategy, the final training and prediction are carried out. We conducted experiments on 41 real imbalanced data sets using two evaluation indexes: F1-score and AUC. Specifically, we compare RGAN-EL with six typical ensemble learning; RGAN is compared with three typical GAN models. The experimental results show that RGAN-EL is significantly better than the other six ensemble learning methods, and RGAN is greatly improved compared with three classical GAN models.
引用
收藏
页数:20
相关论文
共 56 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[4]   LoRAS: an oversampling approach for imbalanced datasets [J].
Bej, Saptarshi ;
Davtyan, Narek ;
Wolfien, Markus ;
Nassar, Mariam ;
Wolkenhauer, Olaf .
MACHINE LEARNING, 2021, 110 (02) :279-301
[5]   Hellinger Net: A Hybrid Imbalance Learning Model to Improve Software Defect Prediction [J].
Chakraborty, Tanujit ;
Chakraborty, Ashis Kumar .
IEEE TRANSACTIONS ON RELIABILITY, 2021, 70 (02) :481-494
[6]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[7]   The Distance-Based Balancing Ensemble Method for Data With a High Imbalance Ratio [J].
Chen, Dong ;
Wang, Xiao-Jun ;
Zhou, Changjun ;
Wang, Bin .
IEEE ACCESS, 2019, 7 :68940-68956
[8]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[9]   A hybrid data-level ensemble to enable learning from highly imbalanced dataset [J].
Chen, Zhi ;
Duan, Jiang ;
Kang, Li ;
Qiu, Guoping .
INFORMATION SCIENCES, 2021, 554 :157-176
[10]   RACOG and wRACOG: Two Probabilistic Oversampling Techniques [J].
Das, Barnan ;
Krishnan, Narayanan C. ;
Cook, Diane J. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (01) :222-234