Credit risk assessment using statistical and machine learning: basic methodology and risk modeling applications

被引:125
作者
Galindo, J.
Tamayo, P.
机构
[1] Harvard University,Department of Economics
[2] Thinking Machines Corp.,undefined
关键词
Risk Assessment; Training Sample; Average Error; Modeling Technique; Risk Model;
D O I
10.1023/A:1008699112516
中图分类号
学科分类号
摘要
Risk assessment of financial intermediaries is an area of renewed interest due to the financial crises of the 1980's and 90's. An accurate estimation of risk, and its use in corporate or global financial risk models, could be translated into a more efficient use of resources. One important ingredient to accomplish this goal is to find accurate predictors of individual risk in the credit portfolios of institutions. In this context we make a comparative analysis of different statistical and machine learning modeling methods of classification on a mortgage loan data set with the motivation to understand their limitations and potential. We introduced a specific modeling methodology based on the study of error curves. Using state-of-the-art modeling techniques we built more than 9,000 models as part of the study. The results show that CART decision-tree models provide the best estimation for default with an average 8.31% error rate for a training sample of 2,000 records. As a result of the error curve analysis for this model we conclude that if more data were available, approximately 22,000 records, a potential 7.32% error rate could be achieved. Neural Networks provided the second best results with an average error of 11.00%. The K-Nearest Neighbor algorithm had an average error rate of 14.95%. These results outperformed the standard Probit algorithm which attained an average error rate of 15.13%. Finally we discuss the possibilities to use this type of accurate predictive model as ingredients of institutional and global risk models.
引用
收藏
页码:107 / 143
页数:36
相关论文
共 29 条
[1]  
Amari S.(1993)A universal theorem on learning curves Neural Networks 6 161-166
[2]  
Black F.(1973)The pricing of options and corporate liabilities Journal of Political Economy 81 637-654
[3]  
Scholes M.S.(1979)An intertemporal asset pricing model with stochastic consumption and investment opportunities Journal of Financial Economics 7 265-296
[4]  
Breeden D.T.(1977)An algorithm for finding best matches in logarithmic expected time ACM Transactions on Mathematical Software 3 9-226
[5]  
Friedman J.H.(1997)On bias, variance, 0/1-loss, and the curse of dimensionality Data Mining and Knowledge Discovery 1 55-77
[6]  
Bentley J.L.(1997)Statistical themes and lessons for data mining Data Mining and Knowledge Discovery 1 11-28
[7]  
Finkel R.A.(1995)Simplicity, scientific inference and econometric modeling The Economic Journal 105 1-21
[8]  
Friedman J.H.(1996)A scalable approach to data mining Informix Tech Notes 6 51-868
[9]  
Glymor C.(1970)Prediction of bank failures Journal of Finance 25 853-183
[10]  
Madigan D.(1973)Theory of rational option pricing Bell Journal of Economics and Management Science 4 141-887