A Quasi-linear SVM Combined with Assembled SMOTE for Imbalanced Data Classification

被引:0
作者
Zhou, Bo [1 ]
Yang, Cheng [1 ]
Guo, Haixiang [2 ]
Hu, Jinglu [1 ]
机构
[1] Waseda Univ, Grad Sch Informat Prod & Syst, Wakamatsu Ku, Hibikino 2-7, Kitakyushu, Fukuoka, Japan
[2] China Univ Geosci, Sch Econ & Management, Wuhan 430074, Peoples R China
来源
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2013年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on imbalanced dataset classification problem by using SVM and oversampling method. Traditional oversampling method increases the occurrence of over-lapping between classes, which leads to poor generalization of SVM classification. To solve this problem this paper proposes a combined method of quasi-linear SVM and assembled SMOTE. The quasi-linear SVM is an SVM with quasi-linear kernel function. It realizes an approximate nonlinear separation boundary by mulit-local linear boundaries with interpolation. The assembled SMOTE implements oversampling with considering of the data distribution information and avoids occurrence of overlapping between classes. Firstly, a partition method based on Minimal Spanning Tree is proposed to obtain local linear partitions, each of which can be separated with one linear separation boundary. Secondly, using the information of local linear partitions, the assembled SMOTE generates synthetic minority class samples. Finally, the quasi-linear SVM realizes a classification of oversampled datasets in the same way as a standard SVM by using a composite quasi-linear kernel function. Experiment results on artificial data and benchmark datasets show that the proposed method is effective and improves classification performances.
引用
收藏
页数:7
相关论文
共 19 条
[1]  
Benhui Chen, 2010, 2010 Second World Congress on Nature and Biologically Inspired Computing (NaBIC 2011), P183, DOI 10.1109/NABIC.2010.5716332
[2]   LIBSVM: A Library for Support Vector Machines [J].
Chang, Chih-Chung ;
Lin, Chih-Jen .
ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
[3]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[4]  
Chen Benhui, 2010, International Journal of Computational Biology and Drug Design, V3, P133, DOI 10.1504/IJCBDD.2010.035239
[5]   Efficient Algorithm for Localized Support Vector Machine [J].
Cheng, Haibin ;
Tan, Pang-Ning ;
Jin, Rong .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2010, 22 (04) :537-549
[6]   A multiple resampling method for learning from imbalanced data sets [J].
Estabrooks, A ;
Jo, TH ;
Japkowicz, N .
COMPUTATIONAL INTELLIGENCE, 2004, 20 (01) :18-36
[7]  
Fu ZY, 2008, LECT NOTES COMPUT SC, V5342, P489
[8]  
Grygorash O, 2006, PROC INT C TOOLS ART, P73
[9]   Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning [J].
Han, H ;
Wang, WY ;
Mao, BH .
ADVANCES IN INTELLIGENT COMPUTING, PT 1, PROCEEDINGS, 2005, 3644 :878-887
[10]   Learning from Imbalanced Data [J].
He, Haibo ;
Garcia, Edwardo A. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2009, 21 (09) :1263-1284