Combination of feature selection approaches with SVM in credit scoring

被引:133
作者
Chen, Fei-Long [1 ]
Li, Feng-Chia [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Ind Engn & Engn Management, Hsinchu, Taiwan
[2] Jen Teh Jr Coll, Dept Informat Management, Taipei, Taiwan
关键词
Support vector machine; Linear discriminate analysis; Decision tree; Rough sets theory; F-score; SUPPORT VECTOR MACHINES; MODELS; CLASSIFICATION; DIAGNOSIS;
D O I
10.1016/j.eswa.2009.12.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The credit scoring has been regarded as a critical topic and its related departments make efforts to collect huge amount of data to avoid wrong decision. An effective classificatory model will objectively help managers instead of intuitive experience. This study proposes four approaches combining with the SVM (support vector machine) classifier for features selection that retains sufficient information for classification purpose. Different credit scoring models are constructed by selecting attributes with four approaches. Two UCI (University of California, Irvine) data sets are chosen to evaluate the accuracy of various hybrid-SVM models. SVM classifier combines with conventional statistical LDA, Decision tree, Rough sets and F-score approaches as features pre-processing step to optimize feature space by removing both irrelevant and redundant features. In this paper, the procedure of the proposed approaches will be described and then evaluated by their performances. The results are compared in combination with SVM classifier and nonparametric Wilcoxon signed rank test will be held to show if there is any significant difference between these models. The result in this study suggests that hybrid credit scoring approach is mostly robust and effective in finding optimal subsets and is a promising method to the fields of data mining. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4902 / 4909
页数:8
相关论文
共 40 条
[1]   On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems [J].
Amaldi, E ;
Kann, V .
THEORETICAL COMPUTER SCIENCE, 1998, 209 (1-2) :237-260
[2]  
[Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
[3]  
[Anonymous], 1998, Feature Extraction, Construction and Selection: A Data Mining Perspective
[4]   Support vector machines for credit scoring and discovery of significant features [J].
Bellotti, Tony ;
Crook, Jonathan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :3302-3308
[5]   Feature selection algorithms using Rough Set Theory [J].
Caballero, Yail ;
Alvarez, Delia ;
Bel, Rafael ;
Garcia, Maria M. .
PROCEEDINGS OF THE 7TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, 2007, :407-411
[6]   A SVM-based cursive character recognizer [J].
Camastra, Francesco .
PATTERN RECOGNITION, 2007, 40 (12) :3721-3727
[7]  
Chang C. C., 2008, LIBSVM LIB SUPPORT V
[8]  
Chen YW., 2005, Combining SVMs with Various Feature Selection Strategies
[9]   Application of irregular and unbalanced data to predict diabetic nephropathy using visualization and feature selection methods [J].
Cho, Baek Hwan ;
Yu, Hwanjo ;
Kim, Kwang-Won ;
Kim, Tae Hyun ;
Kim, In Young ;
Kim, Sun I. .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2008, 42 (01) :37-53
[10]   A prototype classification method and its use in a hybrid solution for multiclass pattern recognition [J].
Chou, CH ;
Lin, CC ;
Liu, YH ;
Chang, F .
PATTERN RECOGNITION, 2006, 39 (04) :624-634