Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification

被引:11
|
作者
Tao, Xinmin [1 ]
Guo, Xinyue [2 ]
Zheng, Yujia [1 ]
Zhang, Xiaohan [1 ]
Chen, Zhiyu [1 ]
机构
[1] Northeast Forestry Univ, Coll Civil Engn & Transportat, 26 Hexing Rd, Harbin 150040, Heilongjiang, Peoples R China
[2] Northeast Forestry Univ, Coll Mech & Elect Engn, Harbin 150040, Heilongjiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Imbalanced datasets; Oversampling; Classification; Overlapping; Within-class imbalance; OVER-SAMPLING TECHNIQUE; SMOTE; NOISY;
D O I
10.1016/j.knosys.2023.110795
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning from imbalanced datasets is a nontrivial task for supervised learning community. Traditional classifiers may have difficulties to learn the concept related to the minority class when addressing imbalanced classification and the issues can become more deteriorated in the presence of other complicated aspects: overlapping, outliers and small disjuncts, etc. In this paper, we propose a selfadaptive oversampling algorithm based on the complexity of minority data for dealing with imbalanced datasets classification problems. In the proposed algorithm, various hyperspheres with different radii determined by imbalance ratio and the distances to the nearest enemy neighbors are firstly generated to cover all minority instances provided that they cannot contain any majority instance. Subsequently, the oversampling process is conducted only within these hyperspheres and thus the generated synthetic minority instances cannot intervene within the majority space, eventually avoiding overlapping issues during achieving between-class balance. In addition, a self-adaptive assignment strategy of oversampling sizes is developed based on the minority data complexity, where the hyperspheres with small radii and few instances in them are provided more chances to be oversampled. The strategy will favor addressing the outliers and small disjuncts issues since the hyperspheres covering the outliers and small disjuncts are usually of small sizes and contain few instances, which makes them have more chances to generate synthetic instances and thus eliminate within-class imbalance due to lack of density. Moreover, since the hyperspheres covering boundary minority instances are relatively small and thus are assigned with larger oversampling sizes, the proposed approach can also strengthen the boundary information of minority class, thus favoring the later learning tasks. The extensive experimental results on various simulated and real-world imbalanced datasets show that the proposed method significantly outperforms other state-of-the-art oversampling ones. & COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:23
相关论文
共 50 条
  • [21] Gaussian Distribution Based Oversampling for Imbalanced Data Classification
    Xie, Yuxi
    Qiu, Min
    Zhang, Haibo
    Peng, Lizhi
    Chen, Zhenxiang
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (02) : 667 - 679
  • [22] An Adaptive and Robust Method for Oriented Oversampling With Spatial Information for Imbalanced Noisy Datasets
    Deng, Yi
    Li, Mingyong
    IEEE ACCESS, 2023, 11 : 122610 - 122624
  • [23] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [24] LSMOTE: A link-based Synthetic Minority Oversampling Technique for binary imbalanced datasets
    Cai, Qin-Nan
    Zhang, Zhong-Liang
    Wu, Yu-Heng
    Zhang, Xiu-Ming
    NEUROCOMPUTING, 2024, 608
  • [25] A quantum-based oversampling method for classification of highly imbalanced and overlapped data
    Yang, Bei
    Tian, Guilan
    Luttrell, Joseph
    Gong, Ping
    Zhang, Chaoyang
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (24) : 2500 - 2513
  • [26] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Lin, Chen-ju
    Leony, Florence
    KNOWLEDGE AND INFORMATION SYSTEMS, 2024, 66 (03) : 2209 - 2233
  • [27] Evidence-based adaptive oversampling algorithm for imbalanced classification
    Chen-ju Lin
    Florence Leony
    Knowledge and Information Systems, 2024, 66 : 2209 - 2233
  • [28] A Novel Oversampling Method for Imbalanced Datasets Based on Density Peaks Clustering
    Cao, Jie
    Shi, Yong
    TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2021, 28 (06): : 1813 - 1819
  • [29] Adaptive Fusion Based Method for Imbalanced Data Classification
    Liang, Zefeng
    Wang, Huan
    Yang, Kaixiang
    Shi, Yifan
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [30] An oversampling method for imbalanced data based on spatial distribution of minority samples SD-KMSMOTE
    Wensheng Yang
    Chengsheng Pan
    Yanyan Zhang
    Scientific Reports, 12