OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:1
|
作者
Li, Junnan [1 ]
Zhu, Qingsheng [1 ]
机构
[1] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
基金
中国国家自然科学基金;
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor; SAMPLING METHOD; SMOTE; NEIGHBOR;
D O I
10.1007/s10489-023-05030-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:31
相关论文
共 50 条
  • [41] A Synthetic Minority Oversampling Technique Based on Gaussian Mixture Model Filtering for Imbalanced Data Classification
    Xu, Zhaozhao
    Shen, Derong
    Kou, Yue
    Nie, Tiezheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3740 - 3753
  • [42] A Boundary-Information-Based Oversampling Approach to Improve Learning Performance for Imbalanced Datasets
    Li, Der-Chiang
    Shi, Qi-Shi
    Lin, Yao-San
    Lin, Liang-Sian
    ENTROPY, 2022, 24 (03)
  • [43] DSPOTE: Density-induced Selection Probability-based Oversampling TEchnique for Imbalanced Learning
    Wei, Zhen
    Zhang, Li
    Zhao, Lei
    2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 2165 - 2171
  • [44] A new oversampling approach based differential evolution on the safe set for highly imbalanced datasets
    Zhang, Jiaoni
    Li, Yanying
    Zhang, Baoshuang
    Wang, Xialin
    Gong, Huanhuan
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 234
  • [45] Self-adaptive oversampling method based on the complexity of minority data in imbalanced datasets classification
    Tao, Xinmin
    Guo, Xinyue
    Zheng, Yujia
    Zhang, Xiaohan
    Chen, Zhiyu
    KNOWLEDGE-BASED SYSTEMS, 2023, 277
  • [46] Hyperspectral Image Classification with Imbalanced Data Based on Oversampling and Convolutional Neural Network
    Cai, Lei
    Zhang, Geng
    AI IN OPTICS AND PHOTONICS (AOPC 2019), 2019, 11342
  • [47] C-SASO: A Clustering-Based Size-Adaptive Safer Oversampling Technique for Imbalanced SAR Ship Classification
    Li, Yongxu
    Lai, Xudong
    Wang, Mingwei
    Zhang, Xi
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [48] GM4OS: An Evolutionary Oversampling Approach for Imbalanced Binary Classification Tasks
    Farinati, Davide
    Vanneschi, Leonardo
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2024, PT I, 2024, 14634 : 68 - 82
  • [49] EDOS: Entropy Difference-based Oversampling Approach for Imbalanced Learning
    Li, Lusi
    He, Haibo
    Li, Jie
    Li, Weijun
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [50] A quantum-based oversampling method for classification of highly imbalanced and overlapped data
    Yang, Bei
    Tian, Guilan
    Luttrell, Joseph
    Gong, Ping
    Zhang, Chaoyang
    EXPERIMENTAL BIOLOGY AND MEDICINE, 2023, 248 (24) : 2500 - 2513