Classification on Imbalanced Data Sets, Taking Advantage of Errors to Improve Performance

被引:0
作者
Lopez-Chau, Asdrubal [1 ]
Garcia-Lamont, Farid [2 ]
Cervantes, Jair [2 ]
机构
[1] Univ Autonoma Estado Mexico, Ctr Univ UAEM, Zumpango 55600, Estado De Mexic, Mexico
[2] Univ Autonoma Estado Mexico, Ctr Univ UAEM, Texcoco 56159, Estado De Mexic, Mexico
来源
ADVANCED INTELLIGENT COMPUTING THEORIES AND APPLICATIONS, ICIC 2015, PT III | 2015年 / 9227卷
关键词
Imbalanced; Classification; Synthetic instances;
D O I
10.1007/978-3-319-22053-6_8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification methods usually exhibit a poor performance when they are applied on imbalanced data sets. In order to overcome this problem, some algorithms have been proposed in the last decade. Most of them generate synthetic instances in order to balance data sets, regardless the classification algorithm. These methods work reasonably well in most cases; however, they tend to cause over-fitting. In this paper, we propose a method to face the imbalance problem. Our approach, which is very simple to implement, works in two phases; the first one detects instances that are difficult to predict correctly for classification methods. These instances are then categorized into "noisy" and "secure", where the former refers to those instances whose most of their nearest neighbors belong to the opposite class. The second phase of our method, consists in generating a number of synthetic instances for each one of those that are difficult to predict correctly. After applying our method to data sets, the AUC area of classifiers is improved dramatically. We compare our method with others of the state-of-the-art, using more than 10 data sets.
引用
收藏
页码:72 / 78
页数:7
相关论文
共 8 条
  • [1] Knowledge discovery in medicine: Current issue and future trend
    Esfandiari, Nura
    Babavalian, Mohammad Reza
    Moghadam, Amir-Masoud Eftekhari
    Tabar, Vahid Kashani
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (09) : 4434 - 4463
  • [2] On the effectiveness of preprocessing methods when dealing with different levels of class imbalance
    Garcia, V.
    Sanchez, J. S.
    Mollineda, R. A.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2012, 25 (01) : 13 - 21
  • [3] Learning from Imbalanced Data
    He, Haibo
    Garcia, Edwardo A.
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) : 1263 - 1284
  • [4] An application of supervised and unsupervised learning approaches to telecommunications fraud detection
    Hilas, Constantinos S.
    Mastorocostas, Paris As.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2008, 21 (07) : 721 - 726
  • [5] Lemnaru C, 2012, LECT NOTES BUS INF P, V102, P35
  • [6] Cost-sensitive learning for defect escalation
    Sheng, Victor S.
    Gu, Bin
    Fang, Wei
    Wu, Jian
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 66 : 146 - 155
  • [7] Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches
    Sun, Jie
    Li, Hui
    Huang, Qing-Hua
    He, Kai-Yu
    [J]. KNOWLEDGE-BASED SYSTEMS, 2014, 57 : 41 - 56
  • [8] Class imbalance and the curse of minority hubs
    Tomasev, Nenad
    Mladenic, Dunja
    [J]. KNOWLEDGE-BASED SYSTEMS, 2013, 53 : 157 - 172