Overlap-Based Undersampling for Improving Imbalanced Data Classification

被引:49
作者
Vuttipittayamongkol, Pattaramon [1 ]
Elyan, Eyad [1 ]
Petrovski, Andrei [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Oxford Brookes Univ, Oxford, England
来源
INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2018, PT I | 2018年 / 11314卷
关键词
Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling;
D O I
10.1007/978-3-030-03493-1_72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.
引用
收藏
页码:689 / 697
页数:9
相关论文
共 16 条
[1]   FCM - THE FUZZY C-MEANS CLUSTERING-ALGORITHM [J].
BEZDEK, JC ;
EHRLICH, R ;
FULL, W .
COMPUTERS & GEOSCIENCES, 1984, 10 (2-3) :191-203
[2]   A Survey of Predictive Modeling on Im balanced Domains [J].
Branco, Paula ;
Torgo, Luis ;
Ribeiro, Rita P. .
ACM COMPUTING SURVEYS, 2016, 49 (02)
[3]  
Denil M, 2010, LECT NOTES ARTIF INT, V6085, P220
[4]   A genetic algorithm approach to optimising random forests applied to class engineered data [J].
Elyan, Eyad ;
Gaber, Mohamed Medhat .
INFORMATION SCIENCES, 2017, 384 :220-234
[5]   A fine-grained Random Forests using class decomposition: an application to medical diagnosis [J].
Elyan, Eyad ;
Gaber, Mohamed Medhat .
NEURAL COMPUTING & APPLICATIONS, 2016, 27 (08) :2279-2288
[6]   A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches [J].
Galar, Mikel ;
Fernandez, Alberto ;
Barrenechea, Edurne ;
Bustince, Humberto ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (04) :463-484
[7]   On the k-NN performance in a challenging scenario of imbalance and overlapping [J].
Garcia, V. ;
Mollineda, R. A. ;
Sanchez, J. S. .
PATTERN ANALYSIS AND APPLICATIONS, 2008, 11 (3-4) :269-280
[8]  
Japkowicz N., 2002, Intelligent Data Analysis, V6, P429
[9]  
Jie Song, 2009, Proceedings of the 2009 Sixth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2009), P109, DOI 10.1109/FSKD.2009.608
[10]   An overlap-sensitive margin classifier for imbalanced and overlapping data [J].
Lee, Han Kyu ;
Kim, Seoung Bum .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 98 :72-83