Overlap-Based Undersampling for Improving Imbalanced Data Classification

被引:48
|
作者
Vuttipittayamongkol, Pattaramon [1 ]
Elyan, Eyad [1 ]
Petrovski, Andrei [1 ]
Jayne, Chrisina [2 ]
机构
[1] Robert Gordon Univ, Aberdeen, Scotland
[2] Oxford Brookes Univ, Oxford, England
关键词
Undersampling; Overlap; Imbalanced data; Classification; Fuzzy C-means; Resampling;
D O I
10.1007/978-3-030-03493-1_72
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification of imbalanced data remains an important field in machine learning. Several methods have been proposed to address the class imbalance problem including data resampling, adaptive learning and cost adjusting algorithms. Data resampling methods are widely used due to their simplicity and flexibility. Most existing resampling techniques aim at rebalancing class distribution. However, class imbalance is not the only factor that impacts the performance of the learning algorithm. Class overlap has proved to have a higher impact on the classification of imbalanced datasets than the dominance of the negative class. In this paper, we propose a new undersampling method that eliminates negative instances from the overlapping region and hence improves the visibility of the minority instances. Testing and evaluating the proposed method using 36 public imbalanced datasets showed statistically significant improvements in classification performance.
引用
收藏
页码:689 / 697
页数:9
相关论文
共 50 条
  • [21] Overlap-Based Cell Tracker
    Chalfoun, Joe
    Cardone, Antonio
    Dima, Alden A.
    Allen, Daniel P.
    Halter, Michael W.
    JOURNAL OF RESEARCH OF THE NATIONAL INSTITUTE OF STANDARDS AND TECHNOLOGY, 2010, 115 (06): : 477 - 486
  • [22] Improving undersampling-based ensemble with rotation forest for imbalanced problem
    Guo, Huaping
    Diao, Xiaoyu
    Liu, Hongbing
    TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES, 2019, 27 (02) : 1371 - 1386
  • [23] Subclass-based Undersampling for Class-imbalanced Image Classification
    Lehmann, Daniel
    Ebner, Marc
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 5, 2022, : 493 - 500
  • [24] A Membership Probability-Based Undersampling Algorithm for Imbalanced Data
    Ahn, Gilseung
    Park, You-Jin
    Hur, Sun
    JOURNAL OF CLASSIFICATION, 2021, 38 (01) : 2 - 15
  • [25] Clustering-based undersampling in class-imbalanced data
    Lin, Wei-Chao
    Tsai, Chih-Fong
    Hu, Ya-Han
    Jhang, Jing-Shang
    INFORMATION SCIENCES, 2017, 409 : 17 - 26
  • [26] Undersampling method based on minority class density for imbalanced data
    Sun, Zhongqiang
    Ying, Wenhao
    Zhang, Wenjin
    Gong, Shengrong
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [27] Evolutionary Undersampling for Extremely Imbalanced Big Data Classification under Apache Spark
    Triguero, I.
    Galar, M.
    Merino, D.
    Maillo, J.
    Bustince, H.
    Herrera, F.
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 640 - 647
  • [28] Hashing-Based Undersampling Ensemble for Imbalanced Pattern Classification Problems
    Ng, Wing W. Y.
    Xu, Shichao
    Zhang, Jianjun
    Tian, Xing
    Rong, Tongwen
    Kwong, Sam
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (02) : 1269 - 1279
  • [29] SCUT: Multi-Class Imbalanced Data Classification using SMOTE and Cluster-based Undersampling
    Agrawal, Astha
    Viktor, Herna L.
    Paquet, Eric
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 226 - 233
  • [30] OverlapShard: Overlap-based Sharding Mechanism
    Liu, Yanan
    Sun, Huiping
    Song, Xu
    Chen, Zhong
    26TH IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (IEEE ISCC 2021), 2021,