A novel progressively undersampling method based on the density peaks sequence for imbalanced data

被引:56
|
作者
Xie, Xiaoying [1 ]
Liu, Huawen [2 ]
Zeng, Shouzhen [3 ]
Lin, Lingbin [4 ]
Li, Wen [5 ]
机构
[1] Zhejiang Normal Univ, Coll Econ & Management, Jinhua 321004, Zhejiang, Peoples R China
[2] Zhejiang Normal Univ, Coll Math & Comp Sci, Jinhua 321004, Zhejiang, Peoples R China
[3] Ningbo Univ, Sch Business, Ningbo 315211, Peoples R China
[4] Zhejiang Normal Univ, Student Management Off, Jinhua 321004, Zhejiang, Peoples R China
[5] Curtin Univ, Dept Math & Stat, Perth, WA 6845, Australia
关键词
Progressive undersampling; Density peaks sequence; Importance degree; Optimal undersampling size; Imbalanced data;
D O I
10.1016/j.knosys.2020.106689
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Undersampling is a widely used resampling technique for imbalanced data. As traditional undersampling techniques, typically making majority and minority classes in imbalanced data into the same scale, tend to miss valuable information, many strategies like clustering have been developed. However, two essential problems still remain and require more efforts to be put; that is, which and how many instances should be extracted in undersampling. To alleviate these two problems, in this paper we propose a novel undersampling method for imbalanced data. It exploits a sequence of density peaks to progressively extract instances from the majority classes of the imbalanced data. Specifically, two factors are introduced to measure the importance degree of each instance in the majority classes. With these two factors, we generate a sampling sequence based on the importance of instances for classification. Furthermore, the optimal undersampling size of the majority classes is automatically determined by progressively extracting the important instances from the sequence. To evaluate the effectiveness of the proposed method, a series of experiments comparing to six popular undersampling methods were conducted on 40 public benchmark datasets. The experimental results show that the performance of the proposed undersampling method is superior to the state-of-the-art undersampling methods. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Default forecasting based on a novel group feature selection method for imbalanced data
    Chi, Guotai
    Xing, Jin
    Pan, Ancheng
    JOURNAL OF CREDIT RISK, 2023, 19 (03): : 51 - 77
  • [42] A Novel Semi-Supervised Learning Method Based on Fast Search and Density Peaks
    Gao, Fei
    Huang, Teng
    Sun, Jinping
    Hussain, Amir
    Yang, Erfu
    Zhou, Huiyu
    COMPLEXITY, 2019, 2019
  • [43] An Earth mover's distance-based undersampling approach for handling class-imbalanced data
    Rekha G.
    Krishna Reddy V.
    Tyagi A.K.
    International Journal of Intelligent Information and Database Systems, 2020, 13 (2-4) : 376 - 392
  • [44] A hybrid imbalanced classification model based on data density
    Shi, Shengnan
    Li, Jie
    Zhu, Dan
    Yang, Fang
    Xu, Yong
    INFORMATION SCIENCES, 2023, 624 : 50 - 67
  • [45] Natural-neighborhood based, label-specific undersampling for imbalanced, multi-label data
    Sadhukhan, Payel
    Palit, Sarbani
    ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2024, 18 (03) : 723 - 744
  • [46] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233
  • [47] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Hanqing Wang
    Bin Zhou
    Jianyong Zhang
    Ruixue Cheng
    International Journal of Computational Intelligence Systems, 2020, 13 : 690 - 697
  • [48] A Novel Density Peaks Clustering Algorithm Based on Local Reachability Density
    Wang, Hanqing
    Zhou, Bin
    Zhang, Jianyong
    Cheng, Ruixue
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2020, 13 (01) : 690 - 697
  • [49] Imbalanced Data Classification Method Based on LSSASMOTE
    Wang, Zhi
    Liu, Qicheng
    IEEE ACCESS, 2023, 11 : 32252 - 32260
  • [50] Density Peaks Clustering Based on Weighted Local Density Sequence and Nearest Neighbor Assignment
    Yu, Donghua
    Liu, Guojun
    Guo, Maozu
    Liu, Xiaoyan
    Yao, Shuang
    IEEE ACCESS, 2019, 7 : 34301 - 34317