A Noise-Filtered Under-Sampling Scheme for Imbalanced Classification

被引:192
作者
Kang, Qi [1 ]
Chen, XiaoShuang [1 ]
Li, Sisi [2 ]
Zhou, MengChu [3 ,4 ]
机构
[1] Tongji Univ, Sch Elect & Informat Engn, Dept Control Sci & Engn, Shanghai 201804, Peoples R China
[2] Mercy Coll, Dept Math & Comp Sci, Dobbs Ferry, NY 10522 USA
[3] Macau Univ Sci & Technol, Inst Syst Engn, Macau 999078, Peoples R China
[4] New Jersey Inst Technol, Dept Elect & Comp Engn, Newark, NJ 07102 USA
关键词
Big data; class imbalance; ensemble; learning method; noise filter; resampling; under-sampling; ALGORITHM; NETWORKS; ENSEMBLE; STRATEGY; MODEL; SMOTE;
D O I
10.1109/TCYB.2016.2606104
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Under-sampling is a popular data preprocessing method in dealing with class imbalance problems, with the purposes of balancing datasets to achieve a high classification rate and avoiding the bias toward majority class examples. It always uses full minority data in a training dataset. However, some noisy minority examples may reduce the performance of classifiers. In this paper, a new under-sampling scheme is proposed by incorporating a noise filter before executing resampling. In order to verify the efficiency, this scheme is implemented based on four popular under-sampling methods, i.e., Undersampling + Adaboost, RUSBoost, UnderBagging, and EasyEnsemble through benchmarks and significance analysis. Furthermore, this paper also summarizes the relationship between algorithm performance and imbalanced ratio. Experimental results indicate that the proposed scheme can improve the original undersampling-based methods with significance in terms of three popular metrics for imbalanced classification, i.e., the area under the curve, F-measure, and G-mean.
引用
收藏
页码:4263 / 4274
页数:12
相关论文
共 71 条
[1]   Mussels Wandering Optimization: An Ecologically Inspired Algorithm for Global Optimization [J].
An, Jing ;
Kang, Qi ;
Wang, Lei ;
Wu, Qidi .
COGNITIVE COMPUTATION, 2013, 5 (02) :188-199
[2]  
[Anonymous], INT JOINT C ART INT
[3]   Evolving Diverse Ensembles Using Genetic Programming for Classification With Unbalanced Data [J].
Bhowan, Urvesh ;
Johnston, Mark ;
Zhang, Mengjie ;
Yao, Xin .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2013, 17 (03) :368-386
[4]  
Breiman F, 1984, OLSHEN STONE CLASSIF
[5]  
Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43
[6]   Weighted Data Gravitation Classification for Standard and Imbalanced Data [J].
Cano, Alberto ;
Zafra, Amelia ;
Ventura, Sebastian .
IEEE TRANSACTIONS ON CYBERNETICS, 2013, 43 (06) :1672-1687
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]   Improved Quantum-Inspired Evolutionary Algorithm for Large-Size Lane Reservation [J].
Che, Ada ;
Wu, Peng ;
Chu, Feng ;
Zhou, MengChu .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2015, 45 (12) :1535-1548
[9]  
Chen X., 2016, P IEEE INT C AUT SCI, P405
[10]   A Polynomial Dynamic Programming Algorithm for Crude Oil Transportation Planning [J].
Chu, Chengbin ;
Chu, Feng ;
Zhou, MengChu ;
Chen, Haoxun ;
Shen, Qingning .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2012, 9 (01) :42-55