Class Imbalance Problem: A Wrapper-Based Approach using Under-Sampling with Ensemble Learning

被引:1
作者
Sikora, Riyaz [1 ]
Lee, Yoon Sang [2 ]
机构
[1] Univ Texas Arlington, 701 S Nedderman Dr, Arlington, TX 76019 USA
[2] Columbus State Univ, 4225 Univ Ave, Columbus, GA 31907 USA
基金
英国科研创新办公室;
关键词
Class Imbalance; Data Mining; Machine Learning; DATA-SETS; DATA CLASSIFICATION; MINORITY CLASS; SMOTE; DATASETS; ALGORITHM; MACHINE;
D O I
10.1007/s10796-024-10533-7
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Imbalanced data sets are a growing problem in data mining and business analytics. However, the ability of machine learning algorithms to predict the minority class deteriorates in the presence of class imbalance. Although there have been many approaches that have been studied in literature to tackle the imbalance problem, most of these approaches have been met with limited success. In this study, we propose three methods based on a wrapper approach that combine the use of under-sampling with ensemble learning to improve the performance of standard data mining algorithms. We test our ensemble methods on 10 data sets collected from the UCI repository with an imbalance ratio of at least 70%. We compare their performance with two other traditional techniques for dealing with the imbalance problem and show significant improvement in the recall, AUROC, and the average of precision and recall.
引用
收藏
页数:16
相关论文
共 81 条
[1]   To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].
Abdi, Lida ;
Hashemi, Sattar .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251
[3]   Classifying imbalanced data sets using similarity based hierarchical decomposition [J].
Beyan, Cigdem ;
Fisher, Robert .
PATTERN RECOGNITION, 2015, 48 (05) :1653-1672
[4]   DBMUTE: density-based majority under-sampling technique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung .
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (03) :827-850
[5]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[6]   Novel Cost-Sensitive Approach to Improve the Multilayer Perceptron Performance on Imbalanced Data [J].
Castro, Cristiano L. ;
Braga, Antonio P. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2013, 24 (06) :888-899
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   A hierarchical multiple kernel support vector machine for customer churn prediction using longitudinal behavioral data [J].
Chen, Zhen-Yu ;
Fan, Zhi-Ping ;
Sun, Minghe .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2012, 223 (02) :461-472
[9]   Grouped SMOTE With Noise Filtering Mechanism for Classifying Imbalanced Data [J].
Cheng, Ke ;
Zhang, Chen ;
Yu, Hualong ;
Yang, Xibei ;
Zou, Haitao ;
Gao, Shang .
IEEE ACCESS, 2019, 7 :170668-170681
[10]   Parallel selective sampling method for imbalanced and large data classification [J].
D'Addabbo, Annarita ;
Maglietta, Rosalia .
PATTERN RECOGNITION LETTERS, 2015, 62 :61-67