An overlap-sensitive margin classifier for imbalanced and overlapping data

被引:90
作者
Lee, Han Kyu [1 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, 145 Anamro, Seoul 02841, South Korea
基金
新加坡国家研究基金会;
关键词
Classification; Imbalanced class; Overlapping class; Support vector machine; SUPPORT VECTOR MACHINES; SVM;
D O I
10.1016/j.eswa.2018.01.008
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification is an important task in various areas. In many real-world applications, class imbalance and overlapping problems have been reported as major issues in the application of traditional classification algorithms. An imbalance problem occurs when training data contain considerably more representatives of one class than of other classes. Class overlap occurs when a region in the data space contains a similar number of data for each class. When a class overlap occurs in imbalanced data sets, classification becomes even more complicated. Although various approaches have been proposed to deal separately with class imbalance and overlapping problems, only a few studies have attempted to address both problems simultaneously. In this paper, we propose an overlap-sensitive margin (OSM) classifier based on a modified fuzzy support vector machine and k-nearest neighbor algorithm to address imbalanced and overlapping data sets. The main idea of the proposed OSM classifier is to separate the data space into soft- and hard-overlap regions using the modified fuzzy support vector machine algorithm. The separated spaces are then classified using the decision boundaries of the support vector machine and 1-nearest neighbor algorithms. Furthermore, by separating a data set into soft- and hard-overlap regions, one can determine which part of the data is to be examined more closely for classification in real-world situations. Experiments using synthetic and real-world data sets demonstrated that the proposed OSM classifier outperformed existing methods for imbalanced and overlapping situations. (C) 2018 Elsevier Ltd. All rights reserved.
引用
收藏
页码:72 / 83
页数:12
相关论文
共 57 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
Alcalá-Fdez J, 2011, J MULT-VALUED LOG S, V17, P255
[3]   Dynamic churn prediction framework with more effective use of rare event data: The case of private banking [J].
Ali, Ozden Gur ;
Ariturk, Umut .
EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (17) :7889-7903
[4]  
[Anonymous], 2004, ACM SIGKDD Explor. Newsl.
[5]   Fuzziness based semi-supervised learning approach for intrusion detection system [J].
Ashfaq, Rana Aamir Raza ;
Wang, Xi-Zhao ;
Huang, Joshua Zhexue ;
Abbas, Haider ;
He, Yu-Lin .
INFORMATION SCIENCES, 2017, 378 :484-497
[6]   Mix-ratio sampling: Classifying multiclass imbalanced mouse brain images using support vector machine [J].
Bae, Min Hyeok ;
Wu, Teresa ;
Pan, Rong .
EXPERT SYSTEMS WITH APPLICATIONS, 2010, 37 (07) :4955-4965
[7]  
Batista GE., 2004, ACM SIGKDD EXPL NEWS, V6, P20, DOI [DOI 10.1145/1007730.1007735, 10.1145/1007730.1007735]
[8]   FSVM-CIL: Fuzzy Support Vector Machines for Class Imbalance Learning [J].
Batuwita, Rukshan ;
Palade, Vasile .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2010, 18 (03) :558-571
[9]   Handling class imbalance in customer churn prediction [J].
Burez, J. ;
Van den Poel, D. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :4626-4636
[10]   Distributed data mining in credit card fraud detection [J].
Chan, PK ;
Fan, W ;
Prodromidis, AL ;
Stolfo, SJ .
IEEE INTELLIGENT SYSTEMS & THEIR APPLICATIONS, 1999, 14 (06) :67-74