A data driven ensemble classifier for credit scoring analysis

被引:110
作者
Hsieh, Nan-Chen [1 ]
Hung, Lun-Ping [2 ]
机构
[1] Natl Taipei Coll Nursing, Dept Informat Management, Taipei 11257, Taiwan
[2] Technol & Sci Inst No Taiwan, Dept Informat Management, Taipei 112, Taiwan
关键词
Clustering; Ensemble classifier; Neural network; Bayesian network; Class-wise classification; Credit scoring system; NEURAL-NETWORKS; BANKRUPTCY PREDICTION; DISCRIMINANT-ANALYSIS; BAYESIAN NETWORKS; MINING APPROACH; MODELS;
D O I
10.1016/j.eswa.2009.05.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This study focuses on predicting whether a credit applicant can be categorized as good, bad or borderline from information initially supplied. This is essentially a classification task for credit scoring. Given its importance, many researchers have recently worked on an ensemble of classifiers. However, to the best of our knowledge, unrepresentative samples drastically reduce the accuracy of the deployment classifier. Few have attempted to preprocess the input samples into more homogeneous cluster groups and then fit the ensemble classifier accordingly. For this reason, we introduce the concept of class-wise classification as a preprocessing step in order to obtain an efficient ensemble classifier. This strategy would work better than a direct ensemble of classifiers without the preprocessing step. The proposed ensemble classifier is constructed by incorporating several data mining techniques, mainly involving optimal associate binning to discretize continuous values; neural network, support vector machine, and Bayesian network are used to augment the ensemble classifier. In particular, the Markov blanket concept of Bayesian network allows for a natural form of feature selection, which provides a basis for mining association rules. The learned knowledge is represented in multiple forms, including causal diagram and constrained association rules. The data driven nature of the proposed system distinguishes it from existing hybrid/ensemble credit scoring systems. (C) 2009 Elsevier Ltd. All rights reserved.
引用
收藏
页码:534 / 545
页数:12
相关论文
共 38 条
[1]   FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY [J].
ALTMAN, EI .
JOURNAL OF FINANCE, 1968, 23 (04) :589-609
[2]   CORPORATE DISTRESS DIAGNOSIS - COMPARISONS USING LINEAR DISCRIMINANT-ANALYSIS AND NEURAL NETWORKS (THE ITALIAN EXPERIENCE) [J].
ALTMAN, EI ;
MARCO, G ;
VARETTO, F .
JOURNAL OF BANKING & FINANCE, 1994, 18 (03) :505-529
[3]  
[Anonymous], 7 ANN RES WORKSH ART
[4]  
[Anonymous], 1993, Proceedings of the 13th International Joint Conference on Artificial Intelligence
[5]  
[Anonymous], NATURE STAT LEARNING
[6]   Bayesian network classifiers for identifying the slope of the customer lifecycle of long-life customers [J].
Baesens, B ;
Verstraeten, G ;
Van den Poel, D ;
Egmont-Petersen, M ;
Van Kenhove, P ;
Vanthienen, J .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2004, 156 (02) :508-523
[7]  
Berle A A., 2004, European Accounting Review, V13, P465
[8]  
Castelo R., 2001, Advances in Intelligent Data Analysis. 4th International Conference, IDA 2001. Proceedings (Lecture Notes in Computer Science Vol.2189), P289
[9]   Credit scoring and rejected instances reassigning through evolutionary computation techniques [J].
Chen, MC ;
Huang, SH .
EXPERT SYSTEMS WITH APPLICATIONS, 2003, 24 (04) :433-441
[10]   Learning Bayesian networks from data: An information-theory based approach [J].
Cheng, J ;
Greiner, R ;
Kelly, J ;
Bell, D ;
Liu, WR .
ARTIFICIAL INTELLIGENCE, 2002, 137 (1-2) :43-90