Comparing Boosting and Bagging Techniques With Noisy and Imbalanced Data

被引:211
作者
Khoshgoftaar, Taghi M. [1 ]
Van Hulse, Jason [1 ]
Napolitano, Amri [1 ]
机构
[1] Florida Atlantic Univ, Boca Raton, FL 33431 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS | 2011年 / 41卷 / 03期
关键词
Bagging; binary classification; boosting; class imbalance; class noise; SMOTE;
D O I
10.1109/TSMCA.2010.2084081
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper compares the performance of several boosting and bagging techniques in the context of learning from imbalanced and noisy binary-class data. Noise and class imbalance are two well-established data characteristics encountered in a wide range of data mining and machine learning initiatives. The learning algorithms studied in this paper, which include SMOTEBoost, RUSBoost, Exactly Balanced Bagging, and Roughly Balanced Bagging, combine boosting or bagging with data sampling to make them more effective when data are imbalanced. These techniques are evaluated in a comprehensive suite of experiments, for which nearly four million classification models were trained. All classifiers are assessed using seven different performance metrics, providing a complete perspective on the performance of these techniques, and results are tested for statistical significance via analysis-of-variance modeling. The experiments show that the bagging techniques generally outperform boosting, and hence in noisy data environments, bagging is the preferred method for handling class imbalance.
引用
收藏
页码:552 / 568
页数:17
相关论文
共 45 条
  • [1] [Anonymous], P SIAM INT C DAT MIN
  • [2] [Anonymous], P ICML 2003 WORKSH L
  • [3] [Anonymous], 2007, Uci machine learning repository
  • [4] ANYFANTIS D, 2007, IFIP INT FEDERATION
  • [5] Barandela R, 2004, LECT NOTES COMPUT SC, V3138, P806
  • [6] Berenson M.L., 1983, INTERMEDIATE STAT ME
  • [7] Identifying mislabeled training data
    Brodley, CE
    Friedl, MA
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1999, 11 : 131 - 167
  • [8] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [9] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [10] DAVIDSON I, 2006, P 10 EUR C PRINC PRA, P478