Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data

被引:2
作者
Zhao, Jiakun [1 ]
Jin, Ju [1 ]
Zhang, Yibo [1 ]
Zhang, Ruifeng [1 ]
Chen, Si [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Shaanxi, Peoples R China
关键词
multi-class; imbalanced data; ensemble method; random balance based on average size; CLASSIFICATION;
D O I
10.3233/IDA-215874
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced data problem is widespread in the real world. In the process of training machine learning models, ignoring imbalanced data problems will cause the performance of the model to deteriorate. At present, researchers have proposed many methods to deal with the imbalanced data problems, but these methods mainly focus on the imbalanced data problems in two-class classification tasks. Learning from multi-class imbalanced data sets is still an open problem. In this paper, an ensemble method for classifying multi-class imbalanced data sets is put forward, called multi-class WHMBoost. It is an extension of WHMBoost that we proposed earlier. We do not use the algorithm used in WHMBoost to process the data, but use random balance based on average size so as to balance the data distribution. The weak classifiers we use in the boosting algorithm are support vector machine and decision tree classifier. In the process of training the model, they participate in training with given weights in order to complement each other's advantages. On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost with state of the art ensemble algorithms using MAUC, MG-mean and MMCC as evaluation criteria. The results demonstrate that it has obvious advantages compared with state of the art ensemble algorithms and can effectively deal with multi-class imbalanced data sets.
引用
收藏
页码:599 / 614
页数:16
相关论文
共 33 条
[1]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[2]  
Arun C., 2020, CLASS IMBALANCE SOFT
[3]  
Batista Gustavo APA, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI [10.1145/1007730.1007735, DOI 10.1145/1007730.1007735]
[4]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[5]   SMOTEBoost: Improving prediction of the minority class in boosting [J].
Chawla, NV ;
Lazarevic, A ;
Hall, LO ;
Bowyer, KW .
KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119
[6]  
Chen Guoqiang, 2018, 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), P274, DOI 10.1109/ICBDA.2018.8367691
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]   Random Balance: Ensembles of variable priors classifiers for imbalanced data [J].
Diez-Pastor, Jose F. ;
Rodriguez, Juan J. ;
Garcia-Osorio, Cesar ;
Kuncheva, Ludmila I. .
KNOWLEDGE-BASED SYSTEMS, 2015, 85 :96-111
[9]  
Fan W, 1999, MACHINE LEARNING, PROCEEDINGS, P97
[10]  
Fiori M, 2016, INT C PATT RECOG, P480, DOI 10.1109/ICPR.2016.7899680