Multi-class WHMBoost: An ensemble algorithm for multi-class imbalanced data

被引:2
作者
Zhao, Jiakun [1 ]
Jin, Ju [1 ]
Zhang, Yibo [1 ]
Zhang, Ruifeng [1 ]
Chen, Si [1 ]
机构
[1] Xi An Jiao Tong Univ, Sch Software Engn, Xian, Shaanxi, Peoples R China
关键词
multi-class; imbalanced data; ensemble method; random balance based on average size; CLASSIFICATION;
D O I
10.3233/IDA-215874
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The imbalanced data problem is widespread in the real world. In the process of training machine learning models, ignoring imbalanced data problems will cause the performance of the model to deteriorate. At present, researchers have proposed many methods to deal with the imbalanced data problems, but these methods mainly focus on the imbalanced data problems in two-class classification tasks. Learning from multi-class imbalanced data sets is still an open problem. In this paper, an ensemble method for classifying multi-class imbalanced data sets is put forward, called multi-class WHMBoost. It is an extension of WHMBoost that we proposed earlier. We do not use the algorithm used in WHMBoost to process the data, but use random balance based on average size so as to balance the data distribution. The weak classifiers we use in the boosting algorithm are support vector machine and decision tree classifier. In the process of training the model, they participate in training with given weights in order to complement each other's advantages. On 18 multi-class imbalanced data sets, we compared the performance of multi-class WHMBoost with state of the art ensemble algorithms using MAUC, MG-mean and MMCC as evaluation criteria. The results demonstrate that it has obvious advantages compared with state of the art ensemble algorithms and can effectively deal with multi-class imbalanced data sets.
引用
收藏
页码:599 / 614
页数:16
相关论文
共 50 条
[21]   A Partial Labeling Framework for Multi-Class Imbalanced Streaming Data [J].
Arabmakki, Elaheh ;
Kantardzic, Mehmed ;
Sethi, Tegjyot Singh .
2017 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2017, :1018-1025
[22]   A regularized ensemble framework of deep learning for cancer detection from multi-class, imbalanced training data [J].
Yuan, Xiaohui ;
Xie, Lijun ;
Abouelenien, Mohamed .
PATTERN RECOGNITION, 2018, 77 :160-172
[23]   Boosting methods for multi-class imbalanced data classification: an experimental review [J].
Jafar Tanha ;
Yousef Abdi ;
Negin Samadi ;
Nazila Razzaghi ;
Mohammad Asadpour .
Journal of Big Data, 7
[24]   Boosting methods for multi-class imbalanced data classification: an experimental review [J].
Tanha, Jafar ;
Abdi, Yousef ;
Samadi, Negin ;
Razzaghi, Nazila ;
Asadpour, Mohammad .
JOURNAL OF BIG DATA, 2020, 7 (01)
[25]   Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data [J].
Li Yijing ;
Guo Haixiang ;
Liu Xiao ;
Li Yanan ;
Li Jinling .
KNOWLEDGE-BASED SYSTEMS, 2016, 94 :88-104
[26]   Parameter-free classification in multi-class imbalanced data sets [J].
Cerf, Loic ;
Gay, Dominique ;
Selmaoui-Folcher, Nazha ;
Cremilleux, Bruno ;
Boulicaut, Jean-Francois .
DATA & KNOWLEDGE ENGINEERING, 2013, 87 :109-129
[27]   A new data complexity measure for multi-class imbalanced classification tasks [J].
Han, Mingming ;
Guo, Husheng ;
Wang, Wenjian .
PATTERN RECOGNITION, 2025, 157
[28]   Online active learning method for multi-class imbalanced data stream [J].
Li, Ang ;
Han, Meng ;
Mu, Dongliang ;
Gao, Zhihui ;
Liu, Shujuan .
KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (04) :2355-2391
[29]   An Effective Recursive Technique for Multi-Class Classification and Regression for Imbalanced Data [J].
Alam, Tahira ;
Ahmed, Chowdhury Farhan ;
Zahin, Sabit Anwar ;
Khan, Muhammad Asif Hossain ;
Islam, Maliha Tashfia .
IEEE ACCESS, 2019, 7 :127615-127630
[30]   An oversampling method for multi-class imbalanced data based on composite weights [J].
Deng, Mingyang ;
Guo, Yingshi ;
Wang, Chang ;
Wu, Fuwei .
PLOS ONE, 2021, 16 (11)