Boosting methods for multi-class imbalanced data classification: an experimental review

被引：0

作者：

Jafar Tanha

Yousef Abdi

Negin Samadi

Nazila Razzaghi

Mohammad Asadpour

机构：

[1] University of Tabriz,Faculty of Electrical and Computer Engineering

来源：

Journal of Big Data | / 7卷

关键词：

Boosting algorithms; Imbalanced data; Multi-class classification; Ensemble learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Since canonical machine learning algorithms assume that the dataset has equal number of samples in each class, binary classification became a very challenging task to discriminate the minority class samples efficiently in imbalanced datasets. For this reason, researchers have been paid attention and have proposed many methods to deal with this problem, which can be broadly categorized into data level and algorithm level. Besides, multi-class imbalanced learning is much harder than binary one and is still an open problem. Boosting algorithms are a class of ensemble learning methods in machine learning that improves the performance of separate base learners by combining them into a composite whole. This paper’s aim is to review the most significant published boosting techniques on multi-class imbalanced datasets. A thorough empirical comparison is conducted to analyze the performance of binary and multi-class boosting algorithms on various multi-class imbalanced datasets. In addition, based on the obtained results for performance evaluation metrics and a recently proposed criteria for comparing metrics, the selected metrics are compared to determine a suitable performance metric for multi-class imbalanced datasets. The experimental studies show that the CatBoost and LogitBoost algorithms are superior to other boosting algorithms on multi-class imbalanced conventional and big datasets, respectively. Furthermore, the MMCC is a better evaluation metric than the MAUC and G-mean in multi-class imbalanced data domains.

引用

共 158 条

[1]

Batista GE(2004)A study of the behavior of several methods for balancing machine learning training data ACM SIGKDD Explorations Newsl 6 20-29

[2]

Prati RC(2014)Prediction of human breast and colon cancers from imbalanced data using nearest neighbor and support vector machines Comput Methods Programs Biomed 113 792-808

[3]

Monard MC(2009)Imbalanced text classification: a term weighting approach Expert Syst Appl 36 690-701

[4]

Majid A(1998)Machine learning for the detection of oil spills in satellite radar images Mach Learn 30 195-215

[5]

Ali S(2015)A hybrid one-class rule learning approach based on swarm intelligence for software fault prediction Innovations Syst Softw Eng 11 289-301

[6]

Iqbal M(2012)An overview of classification algorithms for imbalanced datasets Int J Emerg Technol Adv Eng 2 42-47

[7]

Kausar N(2006)Handling imbalanced datasets: a review GESTS Int Trans Computer Sci Eng 30 25-36

[8]

Liu Y(2012)Multiclass imbalance problems: analysis and potential solutions IEEE Trans Syst Man Cybern 42 1119-30

[9]

Loh HT(2018)An empirical comparison on state-of-the-art multi-class imbalance learning algorithms and a new diversified ensemble learning scheme Knowl-Based Syst 15 81-93

[10]

Sun A(2017)BVDT: A boosted vector decision tree algorithm for multi-class classification problems Int J Pattern Recognit Artif Intell 31 1750016-1345

← 1 2 3 4 5 6 7 8 9 10 →