Boosting the oversampling methods based on differential evolution strategies for imbalanced learning

被引:19
作者
Korkmaz, Sedat [1 ]
Sahman, Mehmet Akif [2 ]
Cinar, Ahmet Cevahir [3 ]
Kaya, Ersin [1 ]
机构
[1] Konya Tech Univ, Fac Engn & Nat Sci, Dept Comp Engn, Konya, Turkey
[2] Selcuk Univ, Fac Technol, Dept Elect & Elect Engn, Konya, Turkey
[3] Selcuk Univ, Fac Technol, Dept Comp Engn, Konya, Turkey
关键词
Imbalanced datasets; Differential evolution; Oversampling; Imbalanced learning; Class imbalance; Differential evolution strategies; PREPROCESSING METHOD; GLOBAL OPTIMIZATION; SOFTWARE TOOL; SMOTE; CLASSIFICATION; ALGORITHMS; KEEL;
D O I
10.1016/j.asoc.2021.107787
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance problem is a challenging problem in the data mining area. To overcome the low classification performance related to imbalanced datasets, sampling strategies are used for balancing the datasets. Oversampling is a technique that increases the minority class samples in various proportions. In this work, these 16 different DE strategies are used for oversampling the imbalanced datasets for better classification. The main aim of this work is to determine the best strategy in terms of Area Under the receiver operating characteristic (ROC) Curve (AUC) and Geometric Mean (G-Mean) metrics. 44 imbalanced datasets are used in experiments. Support Vector Machines (SVM), k-Nearest Neighbor (kNN), and Decision Tree (DT) are used as a classifier in the experiments. The best results are produced by 6th Debohid Strategy (DSt6), 1th Debohid Strategy (DSt1), and 3th Debohid Strategy (DSt3) by using kNN, DT, and SVM classifiers, respectively. The obtained results outperform the 9 state-of-the-art oversampling methods in terms of AUC and G-Mean metrics (C) 2021 Elsevier B.V. All rights reserved.
引用
收藏
页数:19
相关论文
共 58 条
[1]   KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   An evidential reasoning rule based feature selection for improving trauma outcome prediction [J].
Almaghrabi, Fatima ;
Xu, Dong-Ling ;
Yang, Jian-Bo .
APPLIED SOFT COMPUTING, 2021, 103
[4]  
[Anonymous], 2005, Rev Bras Med Do Esporte
[5]  
Attia M.A., 2019, International Journal of Intelligent Systems and Applications, V11, P26, DOI 10.5815/ijisa.2019.04.03
[6]   A modification of tree-seed algorithm using Deb's rules for constrained optimization [J].
Babalik, Ahmet ;
Cinar, Ahmet Cevahir ;
Kiran, Mustafa Servet .
APPLIED SOFT COMPUTING, 2018, 63 :289-305
[7]   Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media [J].
Banerjee, Arghasree ;
Bhattacharjee, Mayukh ;
Ghosh, Kushankur ;
Chatterjee, Sankhadeep .
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) :35995-36031
[8]  
Batista G, 2004, ACM SIGKDD Explor Newsl, V6, P20, DOI DOI 10.1145/1007730.1007735
[9]  
Belda J., 2019, ENTROPY-SWITZ, V21
[10]   ABC-Sampling for balancing imbalanced datasets based on Artificial Bee Colony algorithm [J].
Braytee, Ali ;
Hussain, Farookh Khadeer ;
Anaissi, Ali ;
Kennedy, Paul J. .
2015 IEEE 14TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2015, :594-599