Approaches for credit scorecard calibration: An empirical analysis

被引:35
作者
Beque, Artem [1 ]
Coussement, Kristof [2 ]
Gayler, Ross
Lessmann, Stefan [1 ]
机构
[1] Humboldt Univ, Sch Business & Econ, Unter den Linden 6, D-10099 Berlin, Germany
[2] Univ Catholique Lille, UMR CNRS 9221, IESEG Sch Management, Dept Mkt,LEM, 3 Rue Digue, F-59000 Lille, France
关键词
Credit scoring; Classification; Calibration; Probability of default; ART CLASSIFICATION ALGORITHMS; NEURAL-NETWORKS; RISK; CLASSIFIERS; PROBABILITIES; MODELS; ACCURACY;
D O I
10.1016/j.knosys.2017.07.034
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Financial institutions use credit scorecards for risk management. A scorecard is a data-driven model for predicting default probabilities. Scorecard assessment concentrates on how well a scorecard discriminates good and bad risk. Whether predicted and observed default probabilities agree (i.e., calibration) is an equally important yet often overlooked dimension of scorecard performance. Surprisingly, no attempt has been made to systematically explore different calibration methods and their implications in credit scoring. The goal of the paper is to integrate previous work on probability calibration, to re-introduce available calibration techniques to the credit scoring community, and to empirically examine the extent to which they improve scorecards. More specifically, using real-world credit scoring data, we first develop scorecards using different classifiers, next apply calibration methods to the classifier predictions, and then measure the degree to which they improve calibration. To evaluate performance, we measure the accuracy of predictions in terms of the Brier Score before and after calibration, and employ repeated measures analysis of variance to test for significant differences between group means. Furthermore, we check calibration using reliability plots and decompose the Brier Score to clarify the origin of performance differences across calibrators. The observed results suggest that post-processing scorecard predictions using a calibrator is beneficial. Calibrators improve scorecard calibration while the discriminatory ability remains unaffected. Generalized additive models are particularly suitable for calibrating classifier predictions. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:213 / 227
页数:15
相关论文
共 77 条
[1]   Neural nets versus conventional techniques in credit scoring in Egyptian banking [J].
Abdou, Hussein ;
Pointon, John ;
El-Masry, Ahmed .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) :1275-1292
[2]   Predicting creditworthiness in retail banking with limited scoring data [J].
Abdou, Hussein A. ;
Tsafack, Marc D. Dongmo ;
Ntim, Collins G. ;
Baker, Rose D. .
KNOWLEDGE-BASED SYSTEMS, 2016, 103 :89-103
[3]   CREDIT SCORING, STATISTICAL TECHNIQUES AND EVALUATION CRITERIA: A REVIEW OF THE LITERATURE [J].
Abdou, Hussein A. ;
Pointon, John .
INTELLIGENT SYSTEMS IN ACCOUNTING FINANCE & MANAGEMENT, 2011, 18 (2-3) :59-88
[4]   Genetic programming for credit scoring: The case of Egyptian public sector banks [J].
Abdou, Hussein A. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (09) :11402-11417
[5]   A new hybrid ensemble credit scoring model based on classifiers consensus system approach [J].
Ala'raj, Maher ;
Abbod, Maysam F. .
EXPERT SYSTEMS WITH APPLICATIONS, 2016, 64 :36-55
[6]   Classifiers consensus system approach for credit scoring [J].
Ala'raj, Maher ;
Abbod, Maysam F. .
KNOWLEDGE-BASED SYSTEMS, 2016, 104 :89-105
[7]   CORPORATE DISTRESS DIAGNOSIS - COMPARISONS USING LINEAR DISCRIMINANT-ANALYSIS AND NEURAL NETWORKS (THE ITALIAN EXPERIENCE) [J].
ALTMAN, EI ;
MARCO, G ;
VARETTO, F .
JOURNAL OF BANKING & FINANCE, 1994, 18 (03) :505-529
[8]  
[Anonymous], WELL TRAINED PETS IM
[9]  
[Anonymous], PROCEDIA EC FINANCE
[10]  
[Anonymous], 2016, INT J BUSINESS EC RE