Combination of feature selection approaches with SVM in credit scoring

被引:138
作者
Chen, Fei-Long [1 ]
Li, Feng-Chia [1 ,2 ]
机构
[1] Natl Tsing Hua Univ, Dept Ind Engn & Engn Management, Hsinchu, Taiwan
[2] Jen Teh Jr Coll, Dept Informat Management, Taipei, Taiwan
关键词
Support vector machine; Linear discriminate analysis; Decision tree; Rough sets theory; F-score; SUPPORT VECTOR MACHINES; MODELS; CLASSIFICATION; DIAGNOSIS;
D O I
10.1016/j.eswa.2009.12.025
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The credit scoring has been regarded as a critical topic and its related departments make efforts to collect huge amount of data to avoid wrong decision. An effective classificatory model will objectively help managers instead of intuitive experience. This study proposes four approaches combining with the SVM (support vector machine) classifier for features selection that retains sufficient information for classification purpose. Different credit scoring models are constructed by selecting attributes with four approaches. Two UCI (University of California, Irvine) data sets are chosen to evaluate the accuracy of various hybrid-SVM models. SVM classifier combines with conventional statistical LDA, Decision tree, Rough sets and F-score approaches as features pre-processing step to optimize feature space by removing both irrelevant and redundant features. In this paper, the procedure of the proposed approaches will be described and then evaluated by their performances. The results are compared in combination with SVM classifier and nonparametric Wilcoxon signed rank test will be held to show if there is any significant difference between these models. The result in this study suggests that hybrid credit scoring approach is mostly robust and effective in finding optimal subsets and is a promising method to the fields of data mining. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:4902 / 4909
页数:8
相关论文
共 40 条
[31]  
SAPORTA G, 1990, ANAL DATA STAT
[32]   Support vector machines for classifying and describing credit applicants: detecting typical and critical regions [J].
Schebesch, KB ;
Stecking, R .
JOURNAL OF THE OPERATIONAL RESEARCH SOCIETY, 2005, 56 (09) :1082-1088
[33]  
Skowron A, 1992, INTELLIGENT DECISION, V11, P311, DOI 10.1007/978-94-015-7579-5
[34]   Feature selection for the SVM: An application to hypertension diagnosis [J].
Su, Chao-Ton ;
Yang, Chien-Hsin .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (01) :754-763
[35]   Robustness through prior knowledge: using explanation-based learning to distinguish handwritten Chinese characters [J].
Sun, Qiang ;
Wang, Li-Lun ;
Lim, Shiau Hong ;
DeJong, Gerald .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2007, 10 (3-4) :175-186
[36]   A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers [J].
Thomas, LC .
INTERNATIONAL JOURNAL OF FORECASTING, 2000, 16 (02) :149-172
[37]  
Vapnik V., 1995, The nature of statistical learning theory
[38]   Feature selection based on rough sets and particle swarm optimization [J].
Wang, Xiangyang ;
Yang, Jie ;
Teng, Xiaolong ;
Xia, Weijun ;
Jensen, Richard .
PATTERN RECOGNITION LETTERS, 2007, 28 (04) :459-471
[39]   Neural network credit scoring models [J].
West, D .
COMPUTERS & OPERATIONS RESEARCH, 2000, 27 (11-12) :1131-1152
[40]   Data analysis based on discernibility and indiscernibility [J].
Zhao, Yan ;
Yao, Yiyu ;
Luo, Feng .
INFORMATION SCIENCES, 2007, 177 (22) :4959-4976