Tackling class imbalance problem in binary classification using augmented Neighborhood cleaning algorithm

被引:4
作者
Al Abdouli, Nadyah Obaid [1 ]
Aung, Zeyar [1 ]
Woon, Wei Lee [1 ]
Svetinovic, Davor [1 ]
机构
[1] Institute Center for Smart and Sustainable Systems (iSmart), Department of Electrical Engineering and Computer Science, Masdar Institute of Science and Technology, Abu Dhabi, United Arab Emirates
来源
Lecture Notes in Electrical Engineering | 2015年 / 339卷
关键词
Classification (of information) - Evolutionary algorithms - Normal distribution;
D O I
10.1007/978-3-662-46578-3_98
中图分类号
学科分类号
摘要
Many natural processes generate some observations more frequently than others. These processes result in an imbalanced distributions which cause classifiers to bias toward the majority class because most classifiers assume a normal distribution. In order to address the problem of class imbalance, a number of data preprocessing techniques, which can be generally categorized into over-sampling and under-sampling methods, have been proposed throughout the years. The Neighborhood cleaning rule (NCL) method proposed by Laurikkala is among the most popular under-sampling methods. In this paper, we augment the original NCL algorithm by cleaning the unwanted samples using CHC evolutionary algorithm instead of a simple nearest neighborbased cleaning as in NCL. We name our augmented algorithm as NCL+. The performance of NCL+ is compared to that of NCL on 9 imbalanced datasets using 11 different classifiers. Experimental results show noticeable accuracy improvements by NCL+ over NCL. Moreover, NCL+ is also compared to another popular over-sampling method called Synthetic minority over-sampling technique (SMOTE), and is found to offer better results as well. © Springer-Verlag Berlin Heidelberg 2015.
引用
收藏
页码:827 / 834
相关论文
empty
未找到相关数据