Re-sampling of multi-class imbalanced data using belief function theory and ensemble learning

被引：14

作者：

Grina, Fares ^{[1
,2
]}

Elouedi, Zied ^{[1
]}

Lefevre, Eric ^{[2
]}

机构：

[1] LARODEC, Inst Super Gest Tunis, Tunis, Tunisia

[2] Univ Artois, Lab Genie Informat & Automat Artois LGI2A, UR 3926, F-62400 Bethune, France

来源：

INTERNATIONAL JOURNAL OF APPROXIMATE REASONING | 2023年 / 156卷

关键词：

Imbalanced classification; Ensemble learning; Re-sampling; Evidence theory; CLASSIFICATION; SMOTE; PREDICTION;

D O I：

10.1016/j.ijar.2023.02.006

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Imbalanced classification refers to problems in which there are significantly more instances available for some classes than for others. Such scenarios require special attention because traditional classifiers tend to be biased towards the majority class which has a large number of examples. Different strategies, such as re-sampling, have been suggested to improve imbalanced learning. Ensemble methods have also been proven to yield promising results in the presence of class-imbalance. However, most of them only deal with binary imbalanced datasets. In this paper, we propose a re-sampling approach based on belief function theory and ensemble learning for dealing with class imbalance in the multi-class setting. This technique assigns soft evidential labels to each instance. This evidential modeling provides more information about each object's region, which improves the selection of objects in both undersampling and oversampling. Our approach firstly selects ambiguous majority instances for undersampling, then oversamples minority objects through the generation of synthetic examples in borderline regions to better improve minority class borders. Finally, to improve the induced results, the proposed re-sampling approach is incorporated into an evidential classifier-independent fusion-based ensemble. The comparative study against well-known ensemble methods reveals that our method is efficient according to the G-Mean and F1-score measures, independently from the chosen classifier. (c) 2023 Elsevier Inc. All rights reserved.

引用

页码：1 / 15

页数：15

共 59 条

[1] To Combat Multi-Class Imbalanced Problems by Means of Over-Sampling Techniques [J].

Abdi, Lida ;

Hashemi, Sattar .

IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2016, 28 (01) :238-251

[2]

Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255

[3] New applications of ensembles of classifiers [J].

Barandela, R ;

Sánchez, JS ;

Valdovinos, RM .

PATTERN ANALYSIS AND APPLICATIONS, 2003, 6 (03) :245-256

[4]

Batista GEAPA., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]

[5]

Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401

[6] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[7]

Bunkhumpornpat C, 2009, LECT NOTES ARTIF INT, V5476, P475, DOI 10.1007/978-3-642-01307-2_43

[8] SMOTE: Synthetic minority over-sampling technique [J].

Chawla, Nitesh V. ;

Bowyer, Kevin W. ;

Hall, Lawrence O. ;

Kegelmeyer, W. Philip .

2002, American Association for Artificial Intelligence (16)

[9] SMOTEBoost: Improving prediction of the minority class in boosting [J].

Chawla, NV ;

Lazarevic, A ;

Hall, LO ;

Bowyer, KW .

KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 :107-119

[10]

DEMPSTER AP, 1968, J ROY STAT SOC B, V30, P205

← 1 2 3 4 5 6 →