Oversampling method based on GAN for tabular binary classification problems

被引：1

作者：

Yang, Jie ^{[1
]}

Jiang, Zhenhao ^{[2
]}

Pan, Tingting ^{[1
]}

Chen, Yueqi ^{[1
]}

Pedrycz, Witold ^{[3
]}

机构：

[1] Dalian Univ Technol, Sch Math Sci, Dalian, Liaoning, Peoples R China

[2] Chinese Univ Hong Kong Shenzhen, Sch Data Sci, Shenzhen, Guangdong, Peoples R China

[3] Univ Alberta, Dept Elect & Comp Engn, Edmonton, AB, Canada

来源：

INTELLIGENT DATA ANALYSIS | 2023年 / 27卷 / 05期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Oversampling; GAN; imbalanced learning; IMBALANCED DATASETS; SMOTE; TOOL;

D O I：

10.3233/IDA-220383

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data-imbalanced problems are present in many applications. A big gap in the number of samples in different classes induces classifiers to skew to the majority class and thus diminish the performance of learning and quality of obtained results. Most data level imbalanced learning approaches generate new samples only using the information associated with the minority samples through linearly generating or data distribution fitting. Different from these algorithms, we propose a novel oversampling method based on generative adversarial networks (GANs), named OS-GAN. In this method, GAN is assigned to learn the distribution characteristics of the minority class from some selected majority samples but not random noise. As a result, samples released by the trained generator carry information of both majority and minority classes. Furthermore, the central regularization makes the distribution of all synthetic samples not restricted to the domain of the minority class, which can improve the generalization of learning models or algorithms. Experimental results reported on 14 datasets and one high-dimensional dataset show that OS-GAN outperforms 14 commonly used resampling techniques in terms of G-mean, accuracy and F1-score.

引用

页码：1287 / 1308

页数：22

共 59 条

[1] KEEL: a software tool to assess evolutionary algorithms for data mining problems
Alcala-Fdez, J.
Sanchez, L.
Garcia, S.
del Jesus, M. J.
Ventura, S.
Garrell, J. M.
Otero, J.
Romero, C.
Bacardit, J.
Rivas, V. M.
Fernandez, J. C.
Herrera, F.
[J]. SOFT COMPUTING, 2009, 13 (03) : 307 - 318
[2] Almutairi W, 2020, CATA, P141, DOI DOI 10.29007/H71Z
[3] Arjovsky M, 2017, PR MACH LEARN RES, V70
[4] Asuncion A., 2007, UCI machine learning repository
[5] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[6] Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE
Douzas, Georgios
Bacao, Fernando
Last, Felix
[J]. INFORMATION SCIENCES, 2018, 465 : 1 - 20
[7] Self-Organizing Map Oversampling (SOMO) for imbalanced data set learning
Douzas, Georgios
Bacao, Fernando
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 82 : 40 - 52
[8] GAN-based Data Generation for Speech Emotion Recognition
Eskimez, Sefik Emre
Dimitriadis, Dimitrios
Gmyr, Robert
Kumanati, Kenichi
[J]. INTERSPEECH 2020, 2020, : 3446 - 3450
[9] Comparison of Evaluation Metrics in Classification Applications with Imbalanced Datasets
Fatourechi, Mehrdad
Ward, Rabab K.
Mason, Steven G.
Huggins, Jane
Schloegl, Alois
Birch, Gary E.
[J]. SEVENTH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, PROCEEDINGS, 2008, : 777 - +
[10] SMOTE for Learning from Imbalanced Data: Progress and Challenges, Marking the 15-year Anniversary
Fernandez, Alberto
Garcia, Salvador
Herrera, Francisco
Chawla, Nitesh V.
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2018, 61 : 863 - 905

← 1 2 3 4 5 6 →