A novel ensemble method for classifying imbalanced data

被引:318
作者
Sun, Zhongbin [1 ]
Song, Qinbao [1 ]
Zhu, Xiaoyan [1 ]
Sun, Heli [1 ]
Xu, Baowen [2 ]
Zhou, Yuming [2 ]
机构
[1] Xi An Jiao Tong Univ, Dept Comp Sci & Technol, Xian 710049, Peoples R China
[2] Nanjing Univ, Dept Comp Sci & Technol, Nanjing 210093, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced data; Classification; Ensemble learning; NEURAL-NETWORKS; SOFTWARE TOOL; DATA SETS; CLASSIFICATION; ALGORITHMS; ACCURACY; SMOTE; KEEL;
D O I
10.1016/j.patcog.2014.11.014
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance problems have been reported to severely hinder classification performance of many standard learning algorithms, and have attracted a great deal of attention from researchers of different fields. Therefore, a number of methods, such as sampling methods, cost-sensitive learning methods, and bagging and boosting based ensemble methods, have been proposed to solve these problems. However, these conventional class imbalance handling methods might suffer from the loss of potentially useful information, unexpected mistakes or increasing the likelihood of overfitting because they may alter the original data distribution. Thus we propose a novel ensemble method, which firstly converts an imbalanced data set into multiple balanced ones and then builds a number of classifiers on these multiple data with a specific classification algorithm. Finally, the classification results of these classifiers for new data are combined by a specific ensemble rule. In the empirical study, different class imbalance data handling methods including three conventional sampling methods, one cost-sensitive learning method, six Bagging and Boosting based ensemble methods, our previous method EM1vs1 and two fuzzy-rule based classification methods were compared with our method. The experimental results on 46 imbalanced data sets show that our proposed method is usually superior to the conventional imbalance data handling methods when solving the highly imbalanced problems. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1623 / 1637
页数:15
相关论文
共 88 条
  • [1] AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
  • [2] KEEL: a software tool to assess evolutionary algorithms for data mining problems
    Alcala-Fdez, J.
    Sanchez, L.
    Garcia, S.
    del Jesus, M. J.
    Ventura, S.
    Garrell, J. M.
    Otero, J.
    Romero, C.
    Bacardit, J.
    Rivas, V. M.
    Fernandez, J. C.
    Herrera, F.
    [J]. SOFT COMPUTING, 2009, 13 (03) : 307 - 318
  • [3] Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
  • [4] A hybrid method to face class overlap and class imbalance on neural networks and multi-class scenarios
    Alejo, R.
    Valdovinos, R. M.
    Garcia, V.
    Pacheco-Sanchez, J. H.
    [J]. PATTERN RECOGNITION LETTERS, 2013, 34 (04) : 380 - 388
  • [5] [Anonymous], P 24 INT C MACH LEAR
  • [6] [Anonymous], P IEEE INT C DAT MIN
  • [7] [Anonymous], 12 INT C MACH LEARN
  • [8] [Anonymous], INT C MACH LEARN
  • [9] [Anonymous], 2004, ACM Sigkdd Explor. Newsl, DOI DOI 10.1145/1007730.1007736
  • [10] [Anonymous], 5 ACM SIGKDD INT C K