Applying support vector machines to imbalanced datasets

被引:686
|
作者
Akbani, R
Kwek, S
Japkowicz, N
机构
[1] Univ Texas, Dept Comp Sci, San Antonio, TX 78249 USA
[2] Univ Ottawa, Sch Informat Technol & Engn, Ottawa, ON K1N 6N5, Canada
来源
MACHINE LEARNING: ECML 2004, PROCEEDINGS | 2004年 / 3201卷
关键词
D O I
10.1007/978-3-540-30115-8_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Support Vector Machines (SVM) have been extensively studied and have shown remarkable success in many applications. However the success of SVM is very limited when it is applied to the problem of learning from imbalanced datasets in which negative instances heavily outnumber the positive instances (e.g. in gene profiling and detecting credit card fraud). This paper discusses the factors behind this failure and explains why the common strategy of undersampling the training data may not be the best choice for SVM. We then propose an algorithm for overcoming these problems which is based on a variant of the SMOTE algorithm by Chawla et al, combined with Veropoulos et al's different error costs algorithm. We compare the performance of our algorithm against these two algorithms, along with undersampling and regular SVM and show that our algorithm outperforms all of them.
引用
收藏
页码:39 / 50
页数:12
相关论文
共 50 条
  • [1] Applying Instance-weighted Support Vector Machines to Class Imbalanced Datasets
    Wang, Xiaoguang
    Liu, Xuan
    Matwin, Stan
    Japkowicz, Nathalie
    2014 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2014,
  • [2] Support vector machines for credit risk assessment with imbalanced datasets
    Khemakhem, Sihem
    Boujelbene, Younes
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2018, 10 (02) : 171 - 187
  • [3] Efficient Resampling Methods for Training Support Vector Machines with Imbalanced Datasets
    Batuwita, Rukshan
    Palade, Vasile
    2010 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS IJCNN 2010, 2010,
  • [4] Support Vector Machine Failure in Imbalanced Datasets
    Illan, I. A.
    Gorriz, J. M.
    Ramirez, J.
    Martinez-Murcia, F. J.
    Castillo-Barnes, D.
    Segovia, F.
    Salas-Gonzalez, D.
    UNDERSTANDING THE BRAIN FUNCTION AND EMOTIONS, PT I, 2019, 11486 : 412 - 419
  • [5] Combine Vector Quantization and Support Vector Machine for imbalanced datasets
    Yu, Ting
    Debenham, John
    Jan, Tony
    Simoff, Simeon
    ARTIFICIAL INTELLIGENCE IN THEORY AND PRACTICE, 2006, 217 : 81 - +
  • [6] Constructing Support Vector Machines Ensemble Classification Method for Imbalanced Datasets Based on Fuzzy Integral
    Chen, Pu
    Zhang, Dayong
    MODERN ADVANCES IN APPLIED INTELLIGENCE, IEA/AIE 2014, PT I, 2014, 8481 : 70 - 76
  • [7] Balance method for imbalanced support vector machines
    Department of Applied Mathematics, Xidian University, Xi'an 710071, China
    不详
    不详
    Moshi Shibie yu Rengong Zhineng, 2008, 2 (136-141):
  • [8] Applying Resampling Methods for Imbalanced Datasets to Not So Imbalanced Datasets
    Arbelaitz, Olatz
    Gurrutxaga, Ibai
    Muguerza, Javier
    Maria Perez, Jesus
    ADVANCES IN ARTIFICIAL INTELLIGENCE, CAEPIA 2013, 2013, 8109 : 111 - 120
  • [9] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    KNOWLEDGE AND INFORMATION SYSTEMS, 2010, 25 (01) : 1 - 20
  • [10] Boosting support vector machines for imbalanced data sets
    Wang, Benjamin X.
    Japkowicz, Nathalie
    FOUNDATIONS OF INTELLIGENT SYSTEMS, PROCEEDINGS, 2008, 4994 : 38 - 47