Collective of Base Classifiers for Mining Imbalanced Data

被引:0
作者
Jedrzejowicz, Joanna [1 ]
Jedrzejowicz, Piotr [2 ]
机构
[1] Univ Gdansk, Inst Informat, Fac Math Phys & Informat, PL-80308 Gdansk, Poland
[2] Gdynia Maritime Univ, Dept Informat Syst, PL-81225 Gdynia, Poland
来源
COMPUTATIONAL SCIENCE, ICCS 2022, PT II | 2022年
关键词
Imbalanced data; Oversampling; Gene expression programming;
D O I
10.1007/978-3-031-08754-7_62
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Mining imbalanced datasets is a challenging and difficult problem. In this paper we adress it by proposing GEP-NB classifier based on the oversampling technique. It combines two learning methods - Gene Expression Programming and Naive Bayes, which cooperate to produce a final prediction. At the pre-processing stage a simple mechanism for generating synthetic minority class examples and balancing the training set is used. Next, two genes g1 and g2 are evolved using Gene Expression Programming. They differ by applying in each case a different procedure for selecting synthetic minority class examples. If the class prediction by g1 agrees with the class prediction made by g2, their decision is final. Otherwise the final predictive decision is taken by the Naive Bayes classifier. The approach is validated in an extensive computational experiment. Results produced by GEP-NB are compared with performance of several state-of-the-art classifiers. Comparisons show that GEP-NB offers a competitive performance.
引用
收藏
页码:571 / 585
页数:15
相关论文
共 27 条
  • [1] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    [J]. IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [2] A rough-granular approach to the imbalanced data classification problem
    Borowska, K.
    Stepaniuk, J.
    [J]. APPLIED SOFT COMPUTING, 2019, 83
  • [3] SMOTE: Synthetic minority over-sampling technique
    Chawla, Nitesh V.
    Bowyer, Kevin W.
    Hall, Lawrence O.
    Kegelmeyer, W. Philip
    [J]. 2002, American Association for Artificial Intelligence (16)
  • [4] SMOTEBoost: Improving prediction of the minority class in boosting
    Chawla, NV
    Lazarevic, A
    Hall, LO
    Bowyer, KW
    [J]. KNOWLEDGE DISCOVERY IN DATABASES: PKDD 2003, PROCEEDINGS, 2003, 2838 : 107 - 119
  • [5] Feature selection for imbalanced data based on neighborhood rough sets
    Chen, Hongmei
    Li, Tianrui
    Fan, Xin
    Luo, Chuan
    [J]. INFORMATION SCIENCES, 2019, 483 : 1 - 20
  • [6] Fernandez A., 2018, LEARNING IMBALANCED, V11, DOI DOI 10.1007/978-3-319-98074-4
  • [7] Hierarchical fuzzy rule based classification systems with genetic rule selection for imbalanced data-sets
    Fernandez, Alberto
    del Jesus, Maria Jose
    Herrera, Francisco
    [J]. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (03) : 561 - 577
  • [8] Ferreira C., 2001, Complex Systems, V13, P87
  • [9] Clustering by passing messages between data points
    Frey, Brendan J.
    Dueck, Delbert
    [J]. SCIENCE, 2007, 315 (5814) : 972 - 976
  • [10] Idiot's Bayes - Not so stupid after all?
    Hand, DJ
    Yu, KM
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2001, 69 (03) : 385 - 398