Collective of Base Classifiers for Mining Imbalanced Data

被引：0

作者：

Jedrzejowicz, Joanna ^{[1
]}

Jedrzejowicz, Piotr ^{[2
]}

机构：

[1] Univ Gdansk, Inst Informat, Fac Math Phys & Informat, PL-80308 Gdansk, Poland

[2] Gdynia Maritime Univ, Dept Informat Syst, PL-81225 Gdynia, Poland

来源：

COMPUTATIONAL SCIENCE, ICCS 2022, PT II | 2022年

关键词：

Imbalanced data; Oversampling; Gene expression programming;

D O I：

10.1007/978-3-031-08754-7_62

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Mining imbalanced datasets is a challenging and difficult problem. In this paper we adress it by proposing GEP-NB classifier based on the oversampling technique. It combines two learning methods - Gene Expression Programming and Naive Bayes, which cooperate to produce a final prediction. At the pre-processing stage a simple mechanism for generating synthetic minority class examples and balancing the training set is used. Next, two genes g1 and g2 are evolved using Gene Expression Programming. They differ by applying in each case a different procedure for selecting synthetic minority class examples. If the class prediction by g1 agrees with the class prediction made by g2, their decision is final. Otherwise the final predictive decision is taken by the Naive Bayes classifier. The approach is validated in an extensive computational experiment. Results produced by GEP-NB are compared with performance of several state-of-the-art classifiers. Comparisons show that GEP-NB offers a competitive performance.

引用

页码：571 / 585

页数：15

共 27 条

[1] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
Aridas, Christos K.
Karlos, Stamatis
Kanas, Vasileios G.
Fazakis, Nikos
Kotsiantis, Sotiris B.
[J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
[2] A rough-granular approach to the imbalanced data classification problem
Borowska, K.
Stepaniuk, J.
[J]. APPLIED SOFT COMPUTING, 2019, 83
[3] SMOTE: Synthetic minority over-sampling technique
Chawla, Nitesh V.
Bowyer, Kevin W.
Hall, Lawrence O.
Kegelmeyer, W. Philip
[J]. 2002, American Association for Artificial Intelligence (16)
[4] SMOTEBoost: Improving prediction of the minority class in boosting
Chawla, NV
Lazarevic, A
Hall, LO
Bowyer, KW
[J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
[5] Feature selection for imbalanced data based on neighborhood rough sets
Chen, Hongmei
Li, Tianrui
Fan, Xin
Luo, Chuan
[J]. INFORMATION SCIENCES, 2019, 483 : 1 - 20
[6] Fernandez A., 2018, LEARNING IMBALANCED, V11, DOI DOI 10.1007/978-3-319-98074-4
[7] Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets
Fernandez, Alberto
del Jesus, Maria Jose
Herrera, Francisco
[J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (03) : 561 - 577
[8] Ferreira C., 2001, Complex Systems, V13, P87
[9] Clustering by passing messages between data points
Frey, Brendan J.
Dueck, Delbert
[J]. SCIENCE, 2007, 315 (5814) : 972 - 976
[10] Idiot's Bayes - Not so stupid after all?
Hand, DJ
Yu, KM
[J]. INTERNATIONAL STATISTICAL REVIEW, 2001, 69 (03) : 385 - 398

← 1 2 3 →