A New Under-Sampling Method to Face Class Overlap and Imbalance

被引:33
作者
Guzman-Ponce, Angelica [1 ]
Valdovinos, Rosa Maria [1 ]
Sanchez, Jose Salvador [2 ]
Marcial-Romero, Jose Raymundo [1 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Cerro Coatepec S-N,Ciudad Univ, Toluca 50100, Mexico
[2] Univ Jaume 1, Dept Comp Languages & Syst, Inst New Imaging Technol, Castellon de La Plana 12071, Spain
来源
APPLIED SCIENCES-BASEL | 2020年 / 10卷 / 15期
关键词
class imbalance; class overlap; under-sampling; clustering; DBSCAN; minimum spanning tree; CLASSIFICATION; NETWORKS; IDENTIFICATION; PERFORMANCE; DATASETS; NOISY;
D O I
10.3390/app10155164
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Class overlap and class imbalance are two data complexities that challenge the design of effective classifiers in Pattern Recognition and Data Mining as they may cause a significant loss in performance. Several solutions have been proposed to face both data difficulties, but most of these approaches tackle each problem separately. In this paper, we propose a two-stage under-sampling technique that combines the DBSCAN clustering algorithm to remove noisy samples and clean the decision boundary with a minimum spanning tree algorithm to face the class imbalance, thus handling class overlap and imbalance simultaneously with the aim of improving the performance of classifiers. An extensive experimental study shows a significantly better behavior of the new algorithm as compared to 12 state-of-the-art under-sampling methods using three standard classification models (nearest neighbor rule, J48 decision tree, and support vector machine with a linear kernel) on both real-life and synthetic databases.
引用
收藏
页数:22
相关论文
共 64 条
[1]   A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios [J].
Alejo, R. ;
Valdovinos, R. M. ;
Garcia, V. ;
Pacheco-Sanchez, J. H. .
PATTERN RECOGNITION LETTERS, 2013, 34 (04) :380-388
[2]  
[Anonymous], 2011, Information, Communications and Signal Processing (ICICS) 2011 8th International Conference on, DOI DOI 10.1109/ICICS.2011.6173603
[3]   Strategies for learning in class imbalance problems [J].
Barandela, R ;
Sánchez, JS ;
García, V ;
Rangel, E .
PATTERN RECOGNITION, 2003, 36 (03) :849-851
[4]  
Barella V.H., 2014, P 3 BRAZ C INT SYST, P453
[5]  
Basgall M.J., 2019, PREPROCESSING CLOUD, P75
[6]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[7]   DBMUTE: density-based majority under-sampling technique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung .
KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 50 (03) :827-850
[8]   DBSMOTE: Density-Based Synthetic Minority Over-sampling TEchnique [J].
Bunkhumpornpat, Chumphol ;
Sinapiromsaran, Krung ;
Lursinsap, Chidchanok .
APPLIED INTELLIGENCE, 2012, 36 (03) :664-684
[9]   Tackling class overlap and imbalance problems in software defect prediction [J].
Chen, Lin ;
Fang, Bin ;
Shang, Zhaowei ;
Tang, Yuanyan .
SOFTWARE QUALITY JOURNAL, 2018, 26 (01) :97-125
[10]   Dynamic Bayesian Networks for Fault Detection, Identification, and Recovery in Autonomous Spacecraft [J].
Codetta-Raiteri, Daniele ;
Portinale, Luigi .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2015, 45 (01) :13-24