Comparing boosting and cost-sensitive boosting with imbalanced data

被引:0
作者
机构
[1] School of Information Technology, Jiangxi University of Finance and Economics, Nan Chang
关键词
Adaboost; Class imbalance; Classification; Cost-sensitive learning;
D O I
10.4156/jcit.vol7.issue21.1
中图分类号
学科分类号
摘要
Class imbalance problem has emerged as one of the crucial issues in machine learning and data mining communities since there is increasing growth and availability of real world data distributed skew or unequal misclassification costs of the minority and majority classes. This paper compares the performance of several boosting and cost-sensitive boosting methods in terms of their capabilities in dealing with the class imbalance problem by using evaluation metrics, precision, F-Measure, Geometric mean (G-mean), and the area under receiver operating characteristics curve (AUC) on five NASA benchmark imbalanced datasets (JM1, KC1, KC2, PC1 and CM1). The learning algorithms studied in this paper include Logistic regression, AdaBoost, AdaC1, AdaC2, AdaC3, and cost-sensitive classifiers based on the former two respectively. The experimental results show that it is difficult to say which one is the best for handling class imbalance without consideration of evaluation metrics.
引用
收藏
页码:1 / 8
页数:7
相关论文
共 25 条
  • [1] Wang B., Wang L., Analysis of defects propagation in software system based on weighted software networks, Journal of Convergence Information Technology, 7, 17, pp. 63-77, (2012)
  • [2] He H., Garcia E.A., Learning from imbalanced data, IEEE Transactions on Knowledge and Data Engineering, 21, 9, pp. 1263-1284, (2009)
  • [3] Liang G., Zhang C., An empirical evaluation of bagging with different algorithms on imbalanced data, Proceedings of the ADMA 2011, pp. 339-352, (2011)
  • [4] Khoshgoftaar T.M., van Hulse J., Napolitano A., Comparing boosting and bagging techniques with noisy and imbalanced data, IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 41, 3, pp. 552-568, (2011)
  • [5] Galar M., Fernandez A., Barrenechea E., Bustince H., Herrera F., A review on ensembles for the class imbalance problem: Bagging-, boosting-, and hybrid-based approaches, IEEE Transactions on Systems, Man, and Cybernetics-Part C: Applications and Reviews, 42, 4, pp. 463-484, (2012)
  • [6] Lemnaru C., Potolea R., Imbalanced classification problems: Systematic study, issues and best practices, Lecture Notes in Business Information Processing, 102, pp. 35-50, (2012)
  • [7] Seiffert C., Khoshgoftaar T.M., van Hulse J., Folleco A., An empirical study of the classification performance of learners on imbalanced and noisy software quality data, Information Sciences
  • [8] Fan W., Stolfo S.J., Zhang J., Chan P.K., AdaCost: Misclassification cost-sensitive boosting, Proceedings of International Conference on Machine Learning, pp. 97-105, (1999)
  • [9] Chawla N.V., Bowyer K.W., Hall L.O., Philip Kegelmeyer W., SMOTE: Synthetic minority over-sampling technique, Journal of Artificial Intelligence Research, 16, pp. 321-357, (2002)
  • [10] Han H., Wang W., Mao B., Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning, Lecture Notes in Computer Science, 3644, pp. 878-887, (2005)