Sampling Approaches for Imbalanced Data Classification Problem in Machine Learning

被引:75
作者
Tyagi, Shivani [1 ]
Mittal, Sangeeta [1 ]
机构
[1] Jaypee Inst Informat Technol Noida, Dept Comp Sci & Engn, Noida, UP, India
来源
PROCEEDINGS OF RECENT INNOVATIONS IN COMPUTING, ICRIC 2019 | 2020年 / 597卷
关键词
Imbalanced dataset; Machine learning; Resampling; Undersampling; Oversampling;
D O I
10.1007/978-3-030-29407-6_17
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Real-world datasets in many domains like medical, intrusion detection, fraud transactions and bioinformatics are highly imbalanced. In classification problems, imbalanced datasets negatively affect the accuracy of class predictions. This skewness can be handled either by oversamplingminority class examples or by undersampling majority class. In this work, popular methods of both categories have been evaluated for their capability of improving the imbalanced ratio of five highly imbalanced datasets from different application domains. Effect of balancing on classification results has been also investigated. It has been observed that adaptive synthetic oversampling approach can best improve the imbalance ratio as well as classification results. However, undersampling approaches gave better overall performance on all datasets.
引用
收藏
页码:209 / 221
页数:13
相关论文
共 16 条
  • [1] [Anonymous], 2011, ARXIV11061813
  • [2] [Anonymous], 2007, UCI Machine Learning Repository
  • [3] Barua S, 2014, IEEE T KNOWL DATA EN, V26
  • [4] Das B, IEEE T KNOWL DATA EN, V27
  • [5] PDFOS: PDF estimation based over-sampling for imbalanced two-class problems
    Gao, Ming
    Hong, Xia
    Chen, Sheng
    Harris, Chris J.
    Khalaf, Emad
    [J]. NEUROCOMPUTING, 2014, 138 : 248 - 259
  • [6] Hanskunatai A, 2018, PROCEEDINGS OF 2018 3RD INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATION SYSTEMS (ICCCS), P67, DOI 10.1109/CCOMS.2018.8463228
  • [7] He H, 2009, P IJCNN IEEE WORLD C, P1322
  • [8] Kubat M., 1997, Icml
  • [9] Improving identification of difficult small classes by balancing class distribution
    Laurikkala, J
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, PROCEEDINGS, 2001, 2101 : 63 - 66
  • [10] Exploratory Undersampling for Class-Imbalance Learning
    Liu, Xu-Ying
    Wu, Jianxin
    Zhou, Zhi-Hua
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2009, 39 (02): : 539 - 550