Overlap-Based Undersampling for Improving Imbalanced Data Classification

被引:48
|
作者
Vuttipittayamongkol, Pattaramon [1 ]
Elyan, Eyad [1 ]
Petrovski, Andrei [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Oxford Brookes Univ, Oxford, England
关键词
Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling;
D O I
10.1007/978-3-030-03493-1_72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.
引用
收藏
页码:689 / 697
页数:9
相关论文
共 50 条
  • [31] A Hybrid Surrogate Model for Evolutionary Undersampling in Imbalanced Classification
    Le, Hoang Lam
    Landa-Silva, Dario
    Galar, Mikel
    Garcia, Salvador
    Triguero, I
    2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [32] Evolutionary Undersampling for Classification with Imbalanced Datasets: Proposals and Taxonomy
    Garcia, Salvador
    Herrera, Francisco
    EVOLUTIONARY COMPUTATION, 2009, 17 (03) : 275 - 306
  • [33] Customized Instance Random Undersampling to Increase Knowledge Management for Multiclass Imbalanced Data Classification
    Tusell-Rey, Claudia C.
    Camacho-Nieto, Oscar
    Yanez-Marquez, Cornelio
    Villuendas-Rey, Yenny
    SUSTAINABILITY, 2022, 14 (21)
  • [34] Neighbourhood-based undersampling approach for handling imbalanced and overlapped data
    Vuttipittayamongkol, Pattaramon
    Elyan, Eyad
    INFORMATION SCIENCES, 2020, 509 : 47 - 70
  • [35] Fuzzy Distance-based Undersampling Technique for Imbalanced Flood Data
    Mahamud, Ku Ruhana Ku
    Zorkeflee, Maisarah
    Din, Aniza Mohamed
    PROCEEDINGS OF KNOWLEDGE MANAGEMENT INTERNATIONAL CONFERENCE (KMICE) 2016, 2016, : 509 - 513
  • [36] A fuzzy rough set-based undersampling approach for imbalanced data
    Zhang, Xiao
    He, Zhaoqian
    Yang, Yanyan
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2024, 15 (07) : 2799 - 2810
  • [37] A Comparison of Undersampling, Oversampling, and SMOTE Methods for Dealing with Imbalanced Classification in Educational Data Mining
    Wongvorachan, Tarid
    He, Surina
    Bulut, Okan
    INFORMATION, 2023, 14 (01)
  • [38] A Greedy Algorithm for Neighborhood Overlap-Based Community Detection
    Meghanathan, Natarajan
    ALGORITHMS, 2016, 9 (01)
  • [39] Performance analysis of an overlap-based CSS system
    Yoon, Taeung
    Ahn, Sangho
    Kim, Sun Yong
    Yoono, Seokho
    10TH INTERNATIONAL CONFERENCE ON ADVANCED COMMUNICATION TECHNOLOGY, VOLS I-III: INNOVATIONS TOWARD FUTURE NETWORKS AND SERVICES, 2008, : 424 - +
  • [40] A novel progressively undersampling method based on the density peaks sequence for imbalanced data
    Xie, Xiaoying
    Liu, Huawen
    Zeng, Shouzhen
    Lin, Lingbin
    Li, Wen
    KNOWLEDGE-BASED SYSTEMS, 2021, 213