OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:0
|
作者
Junnan Li
Qingsheng Zhu
机构
[1] Chongqing Industry Polytechnic College,School of Artificial Intelligence and Big Data
来源
Applied Intelligence | 2023年 / 53卷
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor;
D O I
暂无
中图分类号
学科分类号
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:30
相关论文
共 22 条
  • [1] OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification
    Li, Junnan
    Zhu, Qingsheng
    APPLIED INTELLIGENCE, 2023, 53 (24) : 30987 - 31017
  • [2] Natural local density-based adaptive oversampling algorithm for imbalanced classification
    Wang, Wentong
    Yang, Lijun
    Zhang, Jinghui
    Yang, Juntao
    Tang, Dongming
    Liu, Tao
    KNOWLEDGE-BASED SYSTEMS, 2024, 295
  • [3] ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering
    Guo, Ming
    Lu, Jia
    JOURNAL OF SUPERCOMPUTING, 2023, 79 (08) : 8668 - 8698
  • [4] ND-S: an oversampling algorithm based on natural neighbor and density peaks clustering
    Ming Guo
    Jia Lu
    The Journal of Supercomputing, 2023, 79 : 8668 - 8698
  • [5] Hierarchical clustering algorithm based on natural local density peaks
    Cai, Fapeng
    Feng, Ji
    Yang, Degang
    Chen, Zhongshang
    SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (11) : 7989 - 8004
  • [6] Clustering with Local Density Peaks-Based Minimum Spanning Tree
    Cheng, Dongdong
    Zhu, Qingsheng
    Huang, Jinlong
    Wu, Quanwang
    Yang, Lijun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 374 - 387
  • [7] Radial-Based Oversampling for Multiclass Imbalanced Data Classification
    Krawczyk, Bartosz
    Koziarski, Michal
    Wozniak, Michal
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (08) : 2818 - 2831
  • [8] Radial-Based Approach to Imbalanced Data Oversampling
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2017, 2017, 10334 : 318 - 327
  • [9] A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors
    Lin, Junyue
    Liang, Lu
    APPLIED INTELLIGENCE, 2025, 55 (05)
  • [10] Global-local information based oversampling for multi-class imbalanced data
    Han, Mingming
    Guo, Husheng
    Li, Jinyan
    Wang, Wenjian
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2023, 14 (06) : 2071 - 2086