A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets

被引:6
作者
Zhang, Jiaoni [1 ]
Li, Yanying [1 ]
Zhang, Baoshuang [1 ]
Wang, Xialin [1 ]
Gong, Huanhuan [1 ]
机构
[1] Baoji Univ Arts & Sci, Sch Math & Informat Sci, Baoji 721013, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; Differential evolution; Oversampling; Imbalanced datasets; SAMPLING METHOD; SOFTWARE TOOL; SMOTE; ALGORITHMS; KEEL;
D O I
10.1016/j.eswa.2023.121039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Oversampling method is used to solve the class imbalanced issues. Some existing oversampling methods do not well remove noisy samples and avoid synthesizing noisy samples. Therefore, we propose a new oversampling approach based differential evolution on the safe set for highly imbalanced datasets (SS_DEBOHID). SS_DEBOHID firstly uses k-nearest neighbors (kNN) method to learn the safe area of minority; then the DEBOHID oversampling method is used to synthesize new minority samples in the safe area. The advantages of SS_DEBOHID include that (a) it generates samples in the safe area to reduce generation of noisy samples and reduce synthetic samples falling into the classification boundary and majority area; (b) it uses the DEBOHID method to synthesize samples and increase the diversity of samples; (c) the method is suitable for highly imbalanced datasets. The proposed method is compared with 10 methods on 43 highly imbalanced datasets and evaluated on AUC and G_Mean metrics. The experimental results show that SS_DEBOHID obtains more than 30 best performing datasets on KNN, SVM, and DT classifiers in terms of AUC and G_mean, respectively. The proposed method outperforms other methods by 8.07% to 24.34% on average AUC metric and by at least 6.96% and up to 45.37% on average G_mean metric. In addition, we validate the efficiency of SS_DEBOHID on 8 high-dimensional and large sample size datasets. The experimental results show that SS_DEBOHID has better classification performance and robustness.
引用
收藏
页数:22
相关论文
共 49 条
[1]   KEEL: a software tool to assess evolutionary algorithms for data mining problems [J].
Alcala-Fdez, J. ;
Sanchez, L. ;
Garcia, S. ;
del Jesus, M. J. ;
Ventura, S. ;
Garrell, J. M. ;
Otero, J. ;
Romero, C. ;
Bacardit, J. ;
Rivas, V. M. ;
Fernandez, J. C. ;
Herrera, F. .
SOFT COMPUTING, 2009, 13 (03) :307-318
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   An evidential reasoning rule based feature selection for improving trauma outcome prediction [J].
Almaghrabi, Fatima ;
Xu, Dong-Ling ;
Yang, Jian-Bo .
APPLIED SOFT COMPUTING, 2021, 103
[4]   Synthetic minority oversampling in addressing imbalanced sarcasm detection in social media [J].
Banerjee, Arghasree ;
Bhattacharjee, Mayukh ;
Ghosh, Kushankur ;
Chatterjee, Sankhadeep .
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) :35995-36031
[5]   MWMOTE-Majority Weighted Minority Oversampling Technique for Imbalanced Data Set Learning [J].
Barua, Sukarna ;
Islam, Md. Monirul ;
Yao, Xin ;
Murase, Kazuyuki .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2014, 26 (02) :405-425
[6]   SMOTEFRIS-INFFC: Handling the challenge of borderline and noisy examples in imbalanced learning for software defect prediction [J].
Bashir, Kamal ;
Li, Tianrui ;
Yohannese, Chubato Wondaferaw ;
Yahaya, Mahama .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2020, 38 (01) :917-933
[7]  
Batista GEAPA., 2004, ACM SIGKDD Explor. Newsl, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[8]   MAHAKIL: Diversity Based Oversampling Approach to Alleviate the Class Imbalance Issue in Software Defect Prediction [J].
Benni, Kwabena Ebo ;
Keung, Jacky ;
Phannachitta, Passakorn ;
Monden, Akito ;
Mensah, Solomon .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2018, 44 (06) :534-550
[9]   The use of the area under the roc curve in the evaluation of machine learning algorithms [J].
Bradley, AP .
PATTERN RECOGNITION, 1997, 30 (07) :1145-1159
[10]   A new oversampling method in the string space [J].
Briones-Segovia, Victor A. ;
Jimenez-Villar, Victor ;
Ariel Carrasco-Ochoa, Jesus ;
Fco Martinez-Trinidad, Jose .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 183