A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets

被引:6
作者
Zhang, Jiaoni [1 ]
Li, Yanying [1 ]
Zhang, Baoshuang [1 ]
Wang, Xialin [1 ]
Gong, Huanhuan [1 ]
机构
[1] Baoji Univ Arts & Sci, Sch Math & Informat Sci, Baoji 721013, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Class imbalance; Differential evolution; Oversampling; Imbalanced datasets; SAMPLING METHOD; SOFTWARE TOOL; SMOTE; ALGORITHMS; KEEL;
D O I
10.1016/j.eswa.2023.121039
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Oversampling method is used to solve the class imbalanced issues. Some existing oversampling methods do not well remove noisy samples and avoid synthesizing noisy samples. Therefore, we propose a new oversampling approach based differential evolution on the safe set for highly imbalanced datasets (SS_DEBOHID). SS_DEBOHID firstly uses k-nearest neighbors (kNN) method to learn the safe area of minority; then the DEBOHID oversampling method is used to synthesize new minority samples in the safe area. The advantages of SS_DEBOHID include that (a) it generates samples in the safe area to reduce generation of noisy samples and reduce synthetic samples falling into the classification boundary and majority area; (b) it uses the DEBOHID method to synthesize samples and increase the diversity of samples; (c) the method is suitable for highly imbalanced datasets. The proposed method is compared with 10 methods on 43 highly imbalanced datasets and evaluated on AUC and G_Mean metrics. The experimental results show that SS_DEBOHID obtains more than 30 best performing datasets on KNN, SVM, and DT classifiers in terms of AUC and G_mean, respectively. The proposed method outperforms other methods by 8.07% to 24.34% on average AUC metric and by at least 6.96% and up to 45.37% on average G_mean metric. In addition, we validate the efficiency of SS_DEBOHID on 8 high-dimensional and large sample size datasets. The experimental results show that SS_DEBOHID has better classification performance and robustness.
引用
收藏
页数:22
相关论文
共 49 条
[21]  
Gyoten Daiki, 2020, Total Quality Science, V5, P64, DOI 10.17929/tqs.5.64
[22]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[23]   ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning [J].
He, Haibo ;
Bai, Yang ;
Garcia, Edwardo A. ;
Li, Shutao .
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, :1322-1328
[24]   KNNOR: An oversampling technique for imbalanced datasets [J].
Islam, Ashhadul ;
Belhaouari, Samir Brahim ;
Rehman, Atiq Ur ;
Bensmail, Halima .
APPLIED SOFT COMPUTING, 2022, 115
[25]   DEBOHID: A differential evolution based oversampling approach for highly imbalanced datasets [J].
Kaya, Ersin ;
Korkmaz, Sedat ;
Sahman, Mehmet Akif ;
Cinar, Ahmet Cevahir .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 169
[26]   Boosting the oversampling methods based on differential evolution strategies for imbalanced learning [J].
Korkmaz, Sedat ;
Sahman, Mehmet Akif ;
Cinar, Ahmet Cevahir ;
Kaya, Ersin .
APPLIED SOFT COMPUTING, 2021, 112
[27]   An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data [J].
Lee, Dohyun ;
Kim, Kyoungok .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
[28]   A hybrid system for imbalanced data mining [J].
Lee, Zne-Jung ;
Lee, Chou-Yuan ;
Chou, So-Tsung ;
Ma, Wei-Ping ;
Ye, Fulan ;
Chen, Zhen .
MICROSYSTEM TECHNOLOGIES-MICRO-AND NANOSYSTEMS-INFORMATION STORAGE AND PROCESSING SYSTEMS, 2020, 26 (09) :3043-3047
[29]   Learning class-imbalanced data with region-impurity synthetic minority oversampling technique [J].
Li, Der -Chiang ;
Wang, Ssu-Yang ;
Huang, Kuan-Cheng ;
Tsai, Tung -, I .
INFORMATION SCIENCES, 2022, 607 :1391-1407
[30]   A novel oversampling technique for class-imbalanced learning based on SMOTE and natural neighbors [J].
Li, Junnan ;
Zhu, Qingsheng ;
Wu, Quanwang ;
Fan, Zhu .
INFORMATION SCIENCES, 2021, 565 :438-455