Solving Multi-class Imbalance Problems Using Improved Tabular GANs

被引:0
作者
Farou, Zakarya [1 ]
Kopeikina, Liudmila [1 ]
Horvath, Tomas [1 ,2 ]
机构
[1] Eotvos Lorand Univ, Inst Ind Acad Innovat, Dept Data Sci & Engn, Pazmany Peter Setany 1-C, H-1117 Budapest, Hungary
[2] Pavol Jozef Safarik Univ, Inst Comp Sci, Jesenna 6, Kosice 04001, Slovakia
来源
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2022 | 2022年 / 13756卷
关键词
Imbalanced learning; Generative adversarial networks; Data augmentation; Data filtering; Multi-class classification;
D O I
10.1007/978-3-031-21753-1_51
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-class imbalance problems are non-standard derivative data science problems. These problems are associated with the skewness in the data underlying distribution, which, in turn, raises numerous issues for conventional machine learning techniques. To address the lack of data in imbalance problems, we can either collect new data or oversample the underrepresented classes by synthesizing artificial data from original instances. This paper focuses on the latter and introduces two novel tabular GAN variants to handle multi-class imbalance problems. Empirical results on three datasets from the UCI repository demonstrated that the suggested approaches that use our proposed filtering algorithm based on neighboring rules improved the ability of the decision tree classification model to recognize underrepresented class instances, decreased the bias toward the majority class, and enhanced its generalization ability.
引用
收藏
页码:527 / 539
页数:13
相关论文
共 20 条
[1]   Prediction of software fault-prone classes using ensemble random forest with adaptive synthetic sampling algorithm [J].
Balaram, A. ;
Vasundra, S. .
AUTOMATED SOFTWARE ENGINEERING, 2022, 29 (01)
[2]   Synthesizing electronic health records using improved generative adversarial networks [J].
Baowaly, Mrinal Kanti ;
Lin, Chia-Ching ;
Liu, Chao-Lin ;
Chen, Kuan-Ta .
JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (03) :228-241
[3]  
Biau G, 2021, J MACH LEARN RES, V22, P1
[4]   Geometric SMOTE for regression [J].
Camacho, Luis ;
Douzas, Georgios ;
Bacao, Fernando .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
[5]   A Comparative Analysis of Machine Learning Models for Banking News Extraction by Multiclass Classification With Imbalanced Datasets of Financial News: Challenges and Solutions [J].
Dogra, Varun ;
Verma, Sahil ;
Verma, Kavita ;
Jhanjhi, N. Z. ;
Ghosh, Uttam ;
Dac-Nhuong Le .
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2022, 7 (03) :35-52
[6]  
Dua D, 2019, UCI MACHINE LEARNING
[7]  
Farou Zakarya, 2022, Recent Innovations in Computing: Proceedings of ICRIC 2021. Lecture Notes in Electrical Engineering (855), P149, DOI 10.1007/978-981-16-8892-8_12
[8]   Data Generation Using Gene Expression Generator [J].
Farou, Zakarya ;
Mouhoub, Noureddine ;
Horvath, Tomas .
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2020, PT II, 2020, 12490 :54-65
[9]   When do GANs replicate? On the choice of dataset size [J].
Feng, Qianli ;
Guo, Chenqi ;
Benitez-Quiroz, Fabian ;
Martinez, Aleix .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6681-6690
[10]  
Fernandez A., 2018, Learning from Imbalanced Data Sets, DOI DOI 10.1007/978-3-319-98074-4