OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:1
|
作者
Li, Junnan [1 ]
Zhu, Qingsheng [1 ]
机构
[1] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
基金
中国国家自然科学基金;
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor; SAMPLING METHOD; SMOTE; NEIGHBOR;
D O I
10.1007/s10489-023-05030-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:31
相关论文
共 50 条
  • [41] Combining Random Subspace Approach with smote Oversampling for Imbalanced Data Classification
    Ksieniewicz, Pawel
    HYBRID ARTIFICIAL INTELLIGENT SYSTEMS, HAIS 2019, 2019, 11734 : 660 - 673
  • [42] Imbalanced Data Classification Based on Clustering
    Li, Hu
    Zou, Peng
    Han, Weihong
    Xia, Rongze
    COMPUTER-AIDED DESIGN, MANUFACTURING, MODELING AND SIMULATION III, 2014, 443 : 741 - 745
  • [43] WRND: A weighted oversampling framework with relative neighborhood density for imbalanced noisy classification
    Li, Min
    Zhou, Hao
    Liu, Qun
    Gong, Xu
    Wang, Guoyin
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 241
  • [44] A non-parameter oversampling approach for imbalanced data classification based on hybrid natural neighbors
    Lin, Junyue
    Liang, Lu
    APPLIED INTELLIGENCE, 2025, 55 (05)
  • [45] SVDD boundary and DPC clustering technique-based oversampling approach for handling imbalanced and overlapped data
    Tao, Xinmin
    Chen, Wei
    Zhang, Xiaohan
    Guo, Wenjie
    Qi, Lin
    Fan, Zhiting
    KNOWLEDGE-BASED SYSTEMS, 2021, 234
  • [46] Local Clustering Conformal Predictor for Imbalanced Data Classification
    Wang, Huazhen
    Chen, Yewang
    Chen, Zhigang
    Yang, Fan
    ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, AIAI 2013, 2013, 412 : 421 - 431
  • [47] LD-SMOTE: A Novel Local Density Estimation-Based Oversampling Method for Imbalanced Datasets
    Lyu, Jiacheng
    Yang, Jie
    Su, Zhixun
    Zhu, Zilu
    SYMMETRY-BASEL, 2025, 17 (02):
  • [48] Imbalanced Classification Based on Minority Clustering Synthetic Minority Oversampling Technique With Wind Turbine Fault Detection Application
    Yi, Huaikuan
    Jiang, Qingchao
    Yan, Xuefeng
    Wang, Bei
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (09) : 5867 - 5875
  • [49] Clustering ensemble based on density peaks
    Chu R.-H.
    Wang H.-J.
    Yang Y.
    Li T.-R.
    Wang, Hong-Jun (wanghongjun@swjtu.edu.cn), 1600, Science Press (42): : 1401 - 1412
  • [50] Adaptive weighted over-sampling for imbalanced datasets based on density peaks clustering with heuristic filtering
    Tao, Xinmin
    Li, Qing
    Guo, Wenjie
    Ren, Chao
    He, Qing
    Liu, Rui
    Zou, JunRong
    INFORMATION SCIENCES, 2020, 519 : 43 - 73