A new imbalanced data oversampling method based on Bootstrap method and Wasserstein Generative Adversarial Network

被引:1
作者
Hou, Binjie [1 ]
Chen, Gang [1 ]
机构
[1] Dalian Maritime Univ, Dept Math, Dalian 116026, Peoples R China
关键词
imbalanced data; generative adversarial networks (GANs); Bootstrap method (BM); data; generation; probability distribution; SAMPLING METHOD; CLASSIFICATION; SMOTE; NOISY;
D O I
10.3934/mbe.2024190
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Due to their high bias in favor of the majority class, traditional machine learning classifiers face a great challenge when there is a class imbalance in biological data. More recently, generative adversarial networks (GANs) have been applied to imbalanced data classification. For GANs, the distribution of the minority class data fed into discriminator is unknown. The input to the generator is random noise (z) drawn from a standard normal distribution N(0, 1). This method inevitably increases the training difficulty of the network and reduces the quality of the data generated. In order to solve this problem, we proposed a new oversampling algorithm by combining the Bootstrap method and the Wasserstein GAN Network (BM-WGAN). In our approach, the input to the generator network is the data (z) drawn from the distribution of minority class estimated by the BM. The generator was used to synthesize minority class data when the network training is completed. Through the above steps, the generator model can learn the useful features from the minority class and generate realistic -looking minority class samples. The experimental results indicate that BM-WGAN improves the classification performance greatly compared to other oversampling algorithms. The BM-WGAN implementation is available at: https://github.com/ithbjgit1/BMWGAN.git.
引用
收藏
页码:4309 / 4327
页数:19
相关论文
共 40 条
[1]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[2]  
Batista GEAPA, 2005, LECT NOTES COMPUT SC, V3646, P24
[3]   LoRAS: an oversampling approach for imbalanced datasets [J].
Bej, Saptarshi ;
Davtyan, Narek ;
Wolfien, Markus ;
Nassar, Mariam ;
Wolkenhauer, Olaf .
MACHINE LEARNING, 2021, 110 (02) :279-301
[4]   Targeting class imbalance problem using GAN [J].
Bhagwani, Hitesh ;
Agarwal, Sonali ;
Kodipalli, Ashwini ;
Martis, Roshan Joy .
2021 5TH INTERNATIONAL CONFERENCE ON ELECTRICAL, ELECTRONICS, COMMUNICATION, COMPUTER TECHNOLOGIES AND OPTIMIZATION TECHNIQUES (ICEECCOT), 2021, :318-322
[5]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[6]  
Cao L, 2016, 2016 17TH INTERNATIONAL CONFERENCE ON PARALLEL AND DISTRIBUTED COMPUTING, APPLICATIONS AND TECHNOLOGIES (PDCAT), P325, DOI [10.1109/PDCAT.2016.076, 10.1109/PDCAT.2016.75]
[7]  
Chawla NV, 2010, DATA MINING AND KNOWLEDGE DISCOVERY HANDBOOK, SECOND EDITION, P875, DOI 10.1007/978-0-387-09823-4_45
[8]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[9]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[10]   Generative Adversarial Networks An overview [J].
Creswell, Antonia ;
White, Tom ;
Dumoulin, Vincent ;
Arulkumaran, Kai ;
Sengupta, Biswa ;
Bharath, Anil A. .
IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :53-65