Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification

被引:11
|
作者
Tao, Xinmin [1 ]
Guo, Xinyue [2 ]
Zheng, Yujia [1 ]
Zhang, Xiaohan [1 ]
Chen, Zhiyu [1 ]
机构
[1] Northeast Forestry Univ, Coll Civil Engn & Transportat, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China
[2] Northeast Forestry Univ, Coll Mech & Elect Engn, Harbin 150040, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced datasets; Oversampling; Classification; Overlapping; Within-class imbalance; OVER-SAMPLING TECHNIQUE; SMOTE; NOISY;
D O I
10.1016/j.knosys.2023.110795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from imbalanced datasets is a nontrivial task for supervised learning community. Traditional classifiers may have difficulties to learn the concept related to the minority class when addressing imbalanced classification and the issues can become more deteriorated in the presence of other complicated aspects: overlapping, outliers and small disjuncts, etc. In this paper, we propose a selfadaptive oversampling algorithm based on the complexity of minority data for dealing with imbalanced datasets classification problems. In the proposed algorithm, various hyperspheres with different radii determined by imbalance ratio and the distances to the nearest enemy neighbors are firstly generated to cover all minority instances provided that they cannot contain any majority instance. Subsequently, the oversampling process is conducted only within these hyperspheres and thus the generated synthetic minority instances cannot intervene within the majority space, eventually avoiding overlapping issues during achieving between-class balance. In addition, a self-adaptive assignment strategy of oversampling sizes is developed based on the minority data complexity, where the hyperspheres with small radii and few instances in them are provided more chances to be oversampled. The strategy will favor addressing the outliers and small disjuncts issues since the hyperspheres covering the outliers and small disjuncts are usually of small sizes and contain few instances, which makes them have more chances to generate synthetic instances and thus eliminate within-class imbalance due to lack of density. Moreover, since the hyperspheres covering boundary minority instances are relatively small and thus are assigned with larger oversampling sizes, the proposed approach can also strengthen the boundary information of minority class, thus favoring the later learning tasks. The extensive experimental results on various simulated and real-world imbalanced datasets show that the proposed method significantly outperforms other state-of-the-art oversampling ones. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] A MeanShift-guided oversampling with self-adaptive sizes for imbalanced data classification
    Tao, Xinmin
    Zhang, Xiaohan
    Zheng, Yujia
    Qi, Lin
    Fan, Zhiting
    Huang, Shan
    INFORMATION SCIENCES, 2024, 672
  • [2] Local distribution-based adaptive minority oversampling for imbalanced data classification
    Wang, Xinyue
    Xu, Jian
    Zeng, Tieyong
    Jing, Liping
    NEUROCOMPUTING, 2021, 422 : 200 - 213
  • [3] Minority Oversampling in Kernel Adaptive Subspaces for Class Imbalanced Datasets
    Lin, Chin-Teng
    Hsieh, Tsung-Yu
    Liu, Yu-Ting
    Lin, Yang-Yin
    Fang, Chieh-Ning
    Wang, Yu-Kai
    Yen, Gary
    Pal, Nikhil R.
    Chuang, Chun-Hsiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2018, 30 (05) : 950 - 962
  • [4] Adaptive Oversampling for Imbalanced Data Classification
    Ertekin, Seyda
    INFORMATION SCIENCES AND SYSTEMS 2013, 2013, 264 : 261 - 269
  • [5] Clustering-based improved adaptive synthetic minority oversampling technique for imbalanced data classification
    Jin, Dian
    Xie, Dehong
    Liu, Di
    Gong, Murong
    INTELLIGENT DATA ANALYSIS, 2023, 27 (03) : 635 - 652
  • [6] A Novel Adaptive Minority Oversampling Technique for Improved Classification in Data Imbalanced Scenarios
    Tripathi, Ayush
    Chakraborty, Rupayan
    Kopparapu, Sunil Kumar
    2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 10650 - 10657
  • [7] An efficient method to determine sample size in oversampling based on classification complexity for imbalanced data
    Lee, Dohyun
    Kim, Kyoungok
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 184 (184)
  • [8] An Adaptive Oversampling Technique for Imbalanced Datasets
    Shahee, Shaukat Ali
    Ananthakumar, Usha
    ADVANCES IN DATA MINING: APPLICATIONS AND THEORETICAL ASPECTS (ICDM 2018), 2018, 10933 : 1 - 16
  • [9] A novel adaptive boundary weighted and synthetic minority oversampling algorithm for imbalanced datasets
    Song, Xudong
    Chen, Yilin
    Liang, Pan
    Wan, Xiaohui
    Cui, Yunxian
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2023, 44 (02) : 3245 - 3259
  • [10] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122