Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data

被引:2
作者
Zhao, Jiakun [1 ]
Jin, Ju [1 ]
Zhang, Yibo [1 ]
Zhang, Ruifeng [1 ]
Chen, Si [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Shaanxi, Peoples R China
关键词
multi-class; imbalanced data; ensemble method; random balance based on average size; CLASSIFICATION;
D O I
10.3233/IDA-215874
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced data problem is widespread in the real world. In the process of training machine learning models, ignoring imbalanced data problems will cause the performance of the model to deteriorate. At present, researchers have proposed many methods to deal with the imbalanced data problems, but these methods mainly focus on the imbalanced data problems in two-class classification tasks. Learning from multi-class imbalanced data sets is still an open problem. In this paper, an ensemble method for classifying multi-class imbalanced data sets is put forward, called multi-class WHMBoost. It is an extension of WHMBoost that we proposed earlier. We do not use the algorithm used in WHMBoost to process the data, but use random balance based on average size so as to balance the data distribution. The weak classifiers we use in the boosting algorithm are support vector machine and decision tree classifier. In the process of training the model, they participate in training with given weights in order to complement each other's advantages. On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost with state of the art ensemble algorithms using MAUC, MG-mean and MMCC as evaluation criteria. The results demonstrate that it has obvious advantages compared with state of the art ensemble algorithms and can effectively deal with multi-class imbalanced data sets.
引用
收藏
页码:599 / 614
页数:16
相关论文
共 50 条
[41]   A Hybrid Sampling Approach for Imbalanced Binary and Multi-Class Data Using Clustering Analysis [J].
Palli, Abdul Sattar ;
Jaafar, Jafreezal ;
Hashmani, Manzoor Ahmed ;
Gomes, Heitor Murilo ;
Gilal, Abdul Rehman .
IEEE ACCESS, 2022, 10 :118639-118653
[42]   What makes multi-class imbalanced problems difficult? An experimental study [J].
Lango, Mateusz ;
Stefanowski, Jerzy .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 199
[43]   A robust multi-class AdaBoost algorithm for mislabeled noisy data [J].
Sun, Bo ;
Chen, Songcan ;
Wang, Jiandong ;
Chen, Haiyan .
KNOWLEDGE-BASED SYSTEMS, 2016, 102 :87-102
[44]   A New Multi-Class Rebalancing Framework for Imbalance Medical Data [J].
Edward, Jafhate ;
Rosli, Marshima Mohd ;
Seman, Ali .
IEEE ACCESS, 2023, 11 :92857-92874
[45]   Multi-class random forest model to classify wastewater treatment imbalanced data [J].
Distefano, Veronica ;
Palma, Monica ;
De Iaco, Sandra .
SOCIO-ECONOMIC PLANNING SCIENCES, 2024, 95
[46]   Global-local information based oversampling for multi-class imbalanced data [J].
Han, Mingming ;
Guo, Husheng ;
Li, Jinyan ;
Wang, Wenjian .
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (06) :2071-2086
[47]   Combating Mutuality with Difficulty Factors in Multi-class Imbalanced Data: A Similarity-based Hybrid Sampling [J].
Zheng, Zhong ;
Yan, Yuanting ;
Zhang, Yiwen ;
Zhang, Yanping .
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, :387-396
[48]   Multi-class and feature selection extensions of Roughly Balanced Bagging for imbalanced data [J].
Lango, Mateusz ;
Stefanowski, Jerzy .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2018, 50 (01) :97-127
[49]   Evolutionary inversion of class distribution in overlapping areas for multi-class imbalanced learning [J].
Fernandes, Everlandio R. Q. ;
de Carvalho, Andre C. P. L. F. .
INFORMATION SCIENCES, 2019, 494 :141-154
[50]   A Pareto-based Ensemble with Feature and Instance Selection for Learning from Multi-Class Imbalanced Datasets [J].
Fernandez, Alberto ;
Jose Carmona, Cristobal ;
Jose del Jesus, Maria ;
Herrera, Francisco .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2017, 27 (06)