Overlap-Based Undersampling for Improving Imbalanced Data Classification

被引:48
|
作者
Vuttipittayamongkol, Pattaramon [1 ]
Elyan, Eyad [1 ]
Petrovski, Andrei [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Oxford Brookes Univ, Oxford, England
关键词
Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling;
D O I
10.1007/978-3-030-03493-1_72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.
引用
收藏
页码:689 / 697
页数:9
相关论文
共 50 条
  • [1] Improved Overlap-based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2020, 30 (08)
  • [3] Nearest neighbors and density-based undersampling for imbalanced data classification with class overlap
    Sun, Peiqi
    Du, Yanhui
    Xiong, Siyun
    NEUROCOMPUTING, 2024, 609
  • [4] Radial-Based Undersampling for imbalanced data classification
    Koziarski, Michal
    PATTERN RECOGNITION, 2020, 102
  • [5] Discussion on Vuttipittayamongkol, P. and Elyan, E., Improved Overlap-Based Undersampling for Imbalanced Dataset Classification with Application to Epilepsy and Parkinson's Disease
    Fernandez, Alberto
    INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2020, 30 (09)
  • [6] Evolutionary Undersampling for Imbalanced Big Data Classification
    Triguero, I.
    Galar, M.
    Vluymans, S.
    Cornelis, C.
    Bustince, H.
    Herrera, F.
    Saeys, Y.
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 715 - 722
  • [7] Relevant information undersampling to support imbalanced data classification
    Hoyos-Osorio, J.
    Alvarez-Meza, A.
    Daza-Santacoloma, G.
    Orozco-Gutierrez, A.
    Castellanos-Dominguez, G.
    NEUROCOMPUTING, 2021, 436 : 136 - 146
  • [8] A Novel Selective Ensemble Algorithm for Imbalanced Data Classification Based on Exploratory Undersampling
    Yin, Qing-Yan
    Zhang, Jiang-She
    Zhang, Chun-Xia
    Ji, Nan-Nan
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2014, 2014
  • [9] An approach for classification of highly imbalanced data using weighting and undersampling
    Ashish Anand
    Ganesan Pugalenthi
    Gary B. Fogel
    P. N. Suganthan
    Amino Acids, 2010, 39 : 1385 - 1391
  • [10] An approach for classification of highly imbalanced data using weighting and undersampling
    Anand, Ashish
    Pugalenthi, Ganesan
    Fogel, Gary B.
    Suganthan, P. N.
    AMINO ACIDS, 2010, 39 (05) : 1385 - 1391