Overlap-Based Undersampling for Improving Imbalanced Data Classification

被引:48
|
作者
Vuttipittayamongkol, Pattaramon [1 ]
Elyan, Eyad [1 ]
Petrovski, Andrei [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Oxford Brookes Univ, Oxford, England
关键词
Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling;
D O I
10.1007/978-3-030-03493-1_72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.
引用
收藏
页码:689 / 697
页数:9
相关论文
共 50 条
  • [41] An overlap-based human gait cycle detection
    Sugandhi, K.
    Wahid, Farha Fatina
    Nikesh, P.
    Raju, G.
    INTERNATIONAL JOURNAL OF BIOMETRICS, 2019, 11 (02) : 148 - 159
  • [42] Partial Undersampling of Imbalanced Data for Cyber Threats Detection
    Moniruzzaman, Md
    Bagirov, A. M.
    Gondal, Iqbal
    PROCEEDINGS OF THE AUSTRALASIAN COMPUTER SCIENCE WEEK MULTICONFERENCE (ACSW 2020), 2020,
  • [43] Undersampling Instance Selection for Hybrid and Incomplete Imbalanced Data
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2020, 26 (06) : 698 - 719
  • [44] EUSC: A clustering-based surrogate model to accelerate evolutionary undersampling in imbalanced classification
    Hoang Lam Le
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, Isaac
    APPLIED SOFT COMPUTING, 2021, 101
  • [45] A Generalized Optimization Embedded Framework of Undersampling Ensembles for Imbalanced Classification
    Guan, Hongjiao
    Zhang, Yingtao
    Ma, Bin
    Li, Jian
    Wang, Chunpeng
    2021 IEEE 8TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2021,
  • [46] Clustering Based Undersampling for Effective Learning from Imbalanced Data: An Iterative Approach
    Bhattacharya R.
    De R.
    Chakraborty A.
    Sarkar R.
    SN Computer Science, 5 (4)
  • [47] WEIGHTED ENSEMBLE OF DIVERSIFIED SENSITIVITY-BASED UNDERSAMPLING FOR IMBALANCED PATTERN CLASSIFICATION PROBLEMS
    Chai, Yulin
    Zhang, Jianjun
    Ng, Wing W. Y.
    PROCEEDINGS OF 2017 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2017, : 42 - 47
  • [48] Evolutionary undersampling boosting for imbalanced classification of breast cancer malignancy
    Krawczyk, Bartosz
    Galar, Mikel
    Jelen, Lukasz
    Herrera, Francisco
    APPLIED SOFT COMPUTING, 2016, 38 : 714 - 726
  • [49] GUM: A Guided Undersampling Method to Preprocess Imbalanced Datasets for Classification
    Sung, Kisuk
    Brown, W. Eric
    Moreno-Centeno, Erick
    Ding, Yu
    2022 IEEE 18TH INTERNATIONAL CONFERENCE ON AUTOMATION SCIENCE AND ENGINEERING (CASE), 2022, : 1086 - 1091
  • [50] An Iterative Undersampling of Extremely Imbalanced Data Using CSVM
    Lee, Jong Bum
    Lee, Jee-Hyong
    SEVENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2014), 2015, 9445