OALDPC: oversampling approach based on local density peaks clustering for imbalanced classification

被引:1
|
作者
Li, Junnan [1 ]
Zhu, Qingsheng [1 ]
机构
[1] Chongqing Ind Polytech Coll, Sch Artificial Intelligence & Big Data, Chongqing 401120, Peoples R China
基金
中国国家自然科学基金;
关键词
Class-imbalanced learning; Class-imbalanced classification; Oversampling technique; Local density peaks; Natural neighbor; SAMPLING METHOD; SMOTE; NEIGHBOR;
D O I
10.1007/s10489-023-05030-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SMOTE has been favored by researchers in improving imbalanced classification. Nevertheless, imbalances within minority classes and noise generation are two main challenges in SMOTE. Recently, clustering-based oversampling methods are developed to improve SMOTE by eliminating imbalances within minority classes and/or overcoming noise generation. Yet, they still suffer from the following challenges: a) some create more synthetic minority samples in large-size or high-density regions; b) most fail to remove noise from the training set; c) most heavily rely on more than one parameter; d) most can not handle non-spherical data; e) almost all adopted clustering methods are not very suitable for class-imbalanced data. To overcome the above issues of existing clustering-based oversampling methods, this paper proposes a novel oversampling approach based on local density peaks clustering (OALDPC). First, a novel local density peaks clustering (LDPC) is proposed to partition the class-imbalanced training set into separated sub-clusters with different sizes and densities. Second, a novel LDPC-based noise filter is proposed to identify and remove suspicious noise from the class-imbalanced training set. Third, a novel sampling weight is proposed and calculated by weighing the sample number and density of each minority class sub-cluster. Four, a novel interpolation method based on the sampling weight and LDPC is proposed to create more synthetic minority class samples in sparser minority class regions. Intensive experiments have proven that OALDPC outperforms 8 state-of-the-art oversampling techniques in improving F-measure and G-mean of Random Forest, Neural Network and XGBoost on synthetic data and extensive real benchmark data sets from industrial applications.
引用
收藏
页码:30987 / 31017
页数:31
相关论文
共 50 条
  • [21] Counterfactual-based minority oversampling for imbalanced classification
    Wang, Shu
    Luo, Hao
    Huang, Shanshan
    Li, Qingsong
    Liu, Li
    Su, Guoxin
    Liu, Ming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 122
  • [22] An oversampling framework for imbalanced classification based on Laplacian eigenmaps
    Ye, Xiucai
    Li, Hongmin
    Imakura, Akira
    Sakurai, Tetsuya
    NEUROCOMPUTING, 2020, 399 : 107 - 116
  • [23] Imbalanced Learning with Oversampling based on Classification Contribution Degree
    Jiang, Zhenhao
    Yang, Jie
    Liu, Yan
    ADVANCED THEORY AND SIMULATIONS, 2021, 4 (05)
  • [24] Classification with local clustering in imbalanced data sets
    Ji, Hua
    Zhang, Huaxiang
    ADVANCED RESEARCH ON INFORMATION SCIENCE, AUTOMATION AND MATERIAL SYSTEM, PTS 1-6, 2011, 219-220 : 151 - 155
  • [25] A Dictionary-based Oversampling Approach to Clinical Document Classification on Small and Imbalanced Dataset
    Abdollahi, Mahdi
    Gao, Xiaoying
    Mei, Yi
    Ghosh, Shameek
    Li, Jinyan
    2020 IEEE/WIC/ACM INTERNATIONAL JOINT CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2020), 2020, : 357 - 364
  • [26] A new instance density-based synthetic minority oversampling method for imbalanced classification problems
    Ma, Chung-Kang
    Park, You-Jin
    ENGINEERING OPTIMIZATION, 2022, 54 (10) : 1743 - 1757
  • [27] piRNA clustering by finding local density peaks
    Liu, Y.
    Li, A.
    BASIC & CLINICAL PHARMACOLOGY & TOXICOLOGY, 2019, 125 : 34 - 34
  • [28] A Turkish Text Classification Based Feature Selection and Density Peaks Clustering
    Zorarpaci, Ezgi
    2023 31ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2023,
  • [29] Clustering with Local Density Peaks-Based Minimum Spanning Tree
    Cheng, Dongdong
    Zhu, Qingsheng
    Huang, Jinlong
    Wu, Quanwang
    Yang, Lijun
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2021, 33 (02) : 374 - 387
  • [30] Radial-Based oversampling for noisy imbalanced data classification
    Koziarski, Michal
    Krawczyk, Bartosz
    Wozniak, Michal
    NEUROCOMPUTING, 2019, 343 : 19 - 33