VGAN-BL: imbalanced data classification based on generative adversarial network and biased loss

被引:3
作者
Ding, Hongwei [1 ,2 ]
Sun, Yu [1 ,3 ]
Huang, Nana [2 ]
Cui, Xiaohui [1 ,2 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Wuhan Univ, Key Lab Aerosp Informat Secur & Trusted Comp, Minist Educ, Wuhan, Peoples R China
[3] Natl Univ Singapore, Sch Comp, Singapore, Singapore
基金
国家重点研发计划;
关键词
Imbalanced data; Undersampling; Oversampling; VGAN-BL; SAMPLING METHOD; SMOTE;
D O I
10.1007/s00521-023-09180-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The purpose of imbalanced data classification is to solve the problem of unfair learning caused by the large difference in data distribution. Traditional classifiers are designed on the basis of balanced data, but the performance of imbalanced data will decline sharply. Therefore, balancing the majority class and minority class samples before classification is a popular strategy for solving imbalanced learning. Current methods for data balance mainly include oversampling and undersampling. However, the existing undersampling will face the problem of losing important sample information, while oversampling cannot effectively fit the global distribution and generate noise. In recent years, generative adversarial network (GAN) has shown great potential in fitting real sample distributions. Based on this, this paper proposes an improved GAN and biased loss combined model, namely VGAN-BL, to solve the learning problem under imbalanced conditions. In the improvement based on GAN, VAE is used to generate latent vectors with posterior distribution as the input of GAN, and KL similarity measurement loss is introduced into the generator to improve the quality of minority samples generated by GAN. In addition, we propose a biased loss definition method based on the discriminator to improve the performance of classifier. Experiments on 20 real datasets show that the classification performance of the proposed method is significantly improved compared with other advanced methods. The source code can be found here: https://github.com/universuen/VGAN-BL.
引用
收藏
页码:2883 / 2899
页数:17
相关论文
共 50 条
[41]   Semi-supervised Classification Based Mixed Sampling for Imbalanced Data [J].
Zhao, Jianhua ;
Liu, Ning .
OPEN PHYSICS, 2019, 17 (01) :975-983
[42]   GENERATIVE ADVERSARIAL NETWORK FOR IMPROVING DEEP LEARNING BASED MALWARE CLASSIFICATION [J].
Lu, Yan ;
Li, Jiang .
2019 WINTER SIMULATION CONFERENCE (WSC), 2019, :584-593
[43]   Application of Generative Adversarial Networks and Shapley Algorithm Based on Easy Data Augmentation for Imbalanced Text Data [J].
Wu, Jheng-Long ;
Huang, Shuoyen .
APPLIED SCIENCES-BASEL, 2022, 12 (21)
[44]   Radial-Based Undersampling for imbalanced data classification [J].
Koziarski, Michal .
PATTERN RECOGNITION, 2020, 102
[45]   ImGAGN: Imbalanced Network Embedding via Generative Adversarial Graph Networks [J].
Qu, Liang ;
Zhu, Huaisheng ;
Zheng, Ruiqi ;
Shi, Yuhui ;
Yin, Hongzhi .
KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, :1390-1398
[46]   Generative Adversarial Network Based Multi-class Imbalanced Fault Diagnosis of Rolling Bearing [J].
Liu, Qianjun ;
Ma, Guijun ;
Cheng, Cheng .
2019 4TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2019), 2019, :318-324
[47]   Imbalanced Data Classification Based on Clustering [J].
Li, Hu ;
Zou, Peng ;
Han, Weihong ;
Xia, Rongze .
COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 :741-745
[48]   An Approach to Imbalanced Data Classification Based on Instance Selection and Over-Sampling [J].
Czarnowski, Ireneusz ;
Jedrzejowicz, Piotr .
COMPUTATIONAL COLLECTIVE INTELLIGENCE, PT I, 2019, 11683 :601-610
[49]   Synthetic aperture radar automatic target recognition based on cost-sensitive awareness generative adversarial network for imbalanced data [J].
Qin, Jikai ;
Liu, Zheng ;
Ran, Lei ;
Xie, Rong .
IET RADAR SONAR AND NAVIGATION, 2024, 18 (09) :1391-1408
[50]   Cancer diagnosis using generative adversarial networks based on deep learning from imbalanced data [J].
Xiao, Yawen ;
Wu, Jun ;
Lin, Zongli .
COMPUTERS IN BIOLOGY AND MEDICINE, 2021, 135 (135)