A novel oversampling method based on Wasserstein CGAN for imbalanced classification

被引:1
作者
Zhou, Hongfang [1 ,2 ]
Pan, Heng [1 ,2 ]
Zheng, Kangyun [1 ,2 ]
Wu, Zongling [1 ,2 ]
Xiang, Qingyu [3 ]
机构
[1] Xian Univ Technol, Sch Comp Sci & Engn, Xian 710048, Peoples R China
[2] Shaanxi Key Lab Network Comp & Secur Technol, Xian 710048, Peoples R China
[3] Northwest Univ, Sch Chem Engn, Xian 710048, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Classification; Oversampling; K-means; CCAN; k nearest neighbors; SMOTE;
D O I
10.1186/s42400-024-00290-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Class imbalance is a crucial challenge in classification tasks, and in recent years, with the advancements in deep learning, research on oversampling techniques based on CANs has proliferated.These techniques have proven to be excellent in addressing the class imbalance issue by capturing the distributional features of minority samples during training and generating high-quality new samples. However, oversampling methods based on CANs may suffer from gradient vanishing, resulting in mode collapse, and produce noise and boundary-blurring issues when generating new samples. This paper proposes a novel oversampling method based on a conditional CAN (CCAN) incorporating Wasserstein distance. It generates an initial balanced dataset from minority class samples using the CCAN oversampling approach and then uses a noise and boundary recognition method based on K-means and k nearest neighbors algorithm to address the noise and boundary-blurring issues.The proposed method generates new samples that are highly consistent with the original sample distribution and effectively solves the problems of noise data and class boundary blurring. Experimental results on multiple public datasets show that the proposed method achieves significant improvements in evaluation metrics such as Recall, F1_score, C-mean, and AUC.
引用
收藏
页数:20
相关论文
共 33 条
[1]  
[Anonymous], 2017, SONGKLA J SCI TECHNO
[2]  
Batista G.E.A.P.A., 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI DOI 10.1145/1007730.1007735
[3]  
Benchaji Ibtissam, 2018, 2018 2 CYBER SECURIT
[4]   Addressing imbalance in multilabel classification: Measures and random resampling algorithms [J].
Charte, Francisco ;
Rivera, Antonio J. ;
del Jesus, Maria J. ;
Herrera, Francisco .
NEUROCOMPUTING, 2015, 163 :3-16
[5]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[6]  
[程学旗 Cheng Xueqi], 2022, [中国科学院院刊, Bulletin of the Chinese Academy of Sciences], V37, P60
[7]   On Supervised Class-Imbalanced Learning: An Updated Perspective and Some Key Challenges [J].
Das S. ;
Mullick S.S. ;
Zelinka I. .
IEEE Transactions on Artificial Intelligence, 2022, 3 (06) :973-993
[8]   Diversity techniques improve the performance of the best imbalance learning ensembles [J].
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar I. ;
Kuncheva, Ludmila I. .
INFORMATION SCIENCES, 2015, 325 :98-117
[9]   Geometric SMOTE a geometrically enhanced drop-in replacement for SMOTE [J].
Douzas, Georgios ;
Bacao, Fernando .
INFORMATION SCIENCES, 2019, 501 :118-135
[10]   A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484