A Cluster-Based Under-Sampling Algorithm for Class-Imbalanced Data

被引:0
|
作者
Guzman-Ponce, A. [1 ,2 ]
Valdovinos, R. M. [1 ]
Sanchez, J. S. [2 ]
机构
[1] Univ Autonoma Estado Mexico, Fac Ingn, Toluca, Mexico
[2] Univ Jaume 1, Inst New Imaging Technol, Dept Comp Languages & Syst, Castellon de La Plana, Spain
关键词
Class imbalance; DBSCAN; Under-sampling; Noise filtering;
D O I
10.1007/978-3-030-61705-9_25
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The resampling methods are among the most popular strategies to face the class imbalance problem. The objective of these methods is to compensate the imbalanced class distribution by over-sampling the minority class and/or under-sampling the majority class. In this paper, a new under-sampling method based on the DBSCAN clustering algorithm is introduced. The main idea is to remove the majority class instances that are identified as noise by DBSCAN. The proposed method is empirically compared to well-known state-of-the-art under-sampling algorithms over 25 benchmarking databases and the experimental results demonstrate the effectiveness of the new method in terms of sensitivity, specificity, and geometric mean of individual accuracies.
引用
收藏
页码:299 / 311
页数:13
相关论文
共 50 条
  • [31] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [32] Uncertainty Based Under-Sampling for Learning Naive Bayes Classifiers Under Imbalanced Data Sets
    Aridas, Christos K.
    Karlos, Stamatis
    Kanas, Vasileios G.
    Fazakis, Nikos
    Kotsiantis, Sotiris B.
    IEEE ACCESS, 2020, 8 : 2122 - 2133
  • [33] A design of information granule-based under-sampling method in imbalanced data classification
    Tianyu Liu
    Xiubin Zhu
    Witold Pedrycz
    Zhiwu Li
    Soft Computing, 2020, 24 : 17333 - 17347
  • [34] A design of information granule-based under-sampling method in imbalanced data classification
    Liu, Tianyu
    Zhu, Xiubin
    Pedrycz, Witold
    Li, Zhiwu
    SOFT COMPUTING, 2020, 24 (22) : 17333 - 17347
  • [35] Ensemble based on feature projection and under-sampling for imbalanced learning
    Guo, Huaping
    Zhou, Jun
    Wu, Chang-an
    She, Wei
    Xu, Mingliang
    INTELLIGENT DATA ANALYSIS, 2018, 22 (05) : 959 - 980
  • [36] An Incremental Clustering-Based Fault Detection Algorithm for Class-Imbalanced Process Data
    Kwak, Jueun
    Lee, Taehyung
    Kim, Chang Ouk
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2015, 28 (03) : 318 - 328
  • [37] A genetic algorithm-based approach for class-imbalanced learning
    Dong, Shangyan
    Wu, Yongcheng
    THIRD INTERNATIONAL WORKSHOP ON PATTERN RECOGNITION, 2018, 10828
  • [38] Under-sampling class imbalanced datasets by combining clustering analysis and instance selection
    Tsai, Chih-Fong
    Lin, Wei-Chao
    Hu, Ya-Han
    Yao, Guan-Ting
    INFORMATION SCIENCES, 2019, 477 : 47 - 54
  • [39] Multilabel Over-sampling and Under-sampling with Class Alignment for Imbalanced Multilabel Text Classification
    Taha, Adil Yaseen
    Tiun, Sabrina
    Abd Rahman, Abdul Hadi
    Sabah, Ali
    JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGY-MALAYSIA, 2021, 20 (03): : 423 - 456
  • [40] Comparison of Cluster-Based Sampling Approaches for Imbalanced Data of Crashes Involving Large Trucks
    Tahfim, Syed As-Sadeq
    Chen, Yan
    INFORMATION, 2024, 15 (03)