Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification

被引：11

作者：

Tao, Xinmin ^{[1
]}

Guo, Xinyue ^{[2
]}

Zheng, Yujia ^{[1
]}

Zhang, Xiaohan ^{[1
]}

Chen, Zhiyu ^{[1
]}

机构：

[1] Northeast Forestry Univ, Coll Civil Engn & Transportat, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China

[2] Northeast Forestry Univ, Coll Mech & Elect Engn, Harbin 150040, Heilongjiang, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2023年 / 277卷

基金：

中国国家自然科学基金;

关键词：

Imbalanced datasets; Oversampling; Classification; Overlapping; Within-class imbalance; OVER-SAMPLING TECHNIQUE; SMOTE; NOISY;

D O I：

10.1016/j.knosys.2023.110795

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning from imbalanced datasets is a nontrivial task for supervised learning community. Traditional classifiers may have difficulties to learn the concept related to the minority class when addressing imbalanced classification and the issues can become more deteriorated in the presence of other complicated aspects: overlapping, outliers and small disjuncts, etc. In this paper, we propose a selfadaptive oversampling algorithm based on the complexity of minority data for dealing with imbalanced datasets classification problems. In the proposed algorithm, various hyperspheres with different radii determined by imbalance ratio and the distances to the nearest enemy neighbors are firstly generated to cover all minority instances provided that they cannot contain any majority instance. Subsequently, the oversampling process is conducted only within these hyperspheres and thus the generated synthetic minority instances cannot intervene within the majority space, eventually avoiding overlapping issues during achieving between-class balance. In addition, a self-adaptive assignment strategy of oversampling sizes is developed based on the minority data complexity, where the hyperspheres with small radii and few instances in them are provided more chances to be oversampled. The strategy will favor addressing the outliers and small disjuncts issues since the hyperspheres covering the outliers and small disjuncts are usually of small sizes and contain few instances, which makes them have more chances to generate synthetic instances and thus eliminate within-class imbalance due to lack of density. Moreover, since the hyperspheres covering boundary minority instances are relatively small and thus are assigned with larger oversampling sizes, the proposed approach can also strengthen the boundary information of minority class, thus favoring the later learning tasks. The extensive experimental results on various simulated and real-world imbalanced datasets show that the proposed method significantly outperforms other state-of-the-art oversampling ones. & COPY; 2023 Elsevier B.V. All rights reserved.

引用

页数：23

共 50 条

[1] A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification
Tao, Xinmin
Zhang, Xiaohan
Zheng, Yujia
Qi, Lin
Fan, Zhiting
Huang, Shan
INFORMATION SCIENCES, 2024, 672
[2] Local distribution-based adaptive minority oversampling for imbalanced data classification
Wang, Xinyue
Xu, Jian
Zeng, Tieyong
Jing, Liping
NEUROCOMPUTING, 2021, 422 : 200 - 213
[3] Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets
Lin, Chin-Teng
Hsieh, Tsung-Yu
Liu, Yu-Ting
Lin, Yang-Yin
Fang, Chieh-Ning
Wang, Yu-Kai
Yen, Gary
Pal, Nikhil R.
Chuang, Chun-Hsiang
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (05) : 950 - 962
[4] Adaptive Oversampling for Imbalanced Data Classification
Ertekin, Seyda
INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
[5] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
Jin, Dian
Xie, Dehong
Liu, Di
Gong, Murong
INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
[6] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
Tripathi, Ayush
Chakraborty, Rupayan
Kopparapu, Sunil Kumar
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
[7] An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data
Lee, Dohyun
Kim, Kyoungok
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
[8] An Adaptive Oversampling Technique for Imbalanced Datasets
Shahee, Shaukat Ali
Ananthakumar, Usha
ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
[9] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
Song, Xudong
Chen, Yilin
Liang, Pan
Wan, Xiaohui
Cui, Yunxian
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
[10] Counterfactual-based minority oversampling for imbalanced classification
Wang, Shu
Luo, Hao
Huang, Shanshan
Li, Qingsong
Liu, Li
Su, Guoxin
Liu, Ming
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122

← 1 2 3 4 5 →