Ensemble learning via constraint projection and undersampling technique for class-imbalance problem

被引:0
作者
Huaping Guo
Jun Zhou
Chang-an Wu
机构
[1] Xinyang Normal University,School of Computer and Information Technology
[2] Xinyang Normal University,Henan Key Lab. of Analysis and Applications of Education Big Data
来源
Soft Computing | 2020年 / 24卷
关键词
Ensemble learning; Constraint projection; Undersampling technique; Class-imbalance;
D O I
暂无
中图分类号
学科分类号
摘要
Ensemble learning is an effective technique for the class-imbalance problem, and the key for obtaining a successful ensemble is to create individual base classifiers with high accuracy and diversity. In this paper, we propose a novel ensemble learning method via constraint projection and undersampling technique, constructing each base classifier through the following two steps: 1) constructing a set of pairwise constraints by undersampling examples from the minority/majority class set and learning a projection matrix from the pairwise constraint set and 2) undersampling the original training set to obtaining a new training set on which a base classifier is constructed in the new feature space defined by the projection matrix. For the first step, the projection matrix is mainly used to enhance the separability between the diverse class examples and thus to improve the performance of the base classifier, and the undersampling technique is used to create diverse sets of pairwise constraints to train diverse projection matrices, thus introducing diversity to base classifiers. For the second step, the undersampling technique aims to improve the performance of base classifiers on the minority class and further increase the diversity between the individual base classifiers. The experimental results show that the proposed method shows significantly better performance on the measures of recall, g-mean, f-measure and AUC than other state-of-the-art methods for 29 datasets with various data distributions and imbalance ratios.
引用
收藏
页码:4711 / 4727
页数:16
相关论文
共 131 条
[1]  
Alcalá-Fdez J(2011)KEEL data-mining software tool: data set repository integration of algorithms and experimental analysis framework J Multiple Valued Logic Soft Comput 17 255-287
[2]  
Fernandez A(2016)Boosted near-miss under-sampling on SVM ensembles for concept detection in large-scale imbalanced datasets Neurocomputing 172 198-206
[3]  
Luengo J(2003)New applications of ensembles of classifiers Pattern Anal Appl 6 245-256
[4]  
Derrac J(2014)MWMOTE-majority weighted minority oversampling technique for imbalanced data set learning IEEE Trans Knowl Data Eng 26 405-425
[5]  
García S(1996)Bagging predictors Mach Learn 24 123-140
[6]  
Sánchez L(2001)Random forests Mach Learn 45 5-32
[7]  
Herrera F(2018)Oversampling imbalanced data in the string space Pattern Recognit Lett 103 32-38
[8]  
Bao L(2002)SMOTE: synthetic minority over-sampling technique J Artif Intell Res 16 321-357
[9]  
Juan C(2006)Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 7 1-30
[10]  
Li J(2017)Redundancy-driven modified tomek-link based undersampling: a solution to class imbalance Pattern Recognit Lett 93 3-12