An Empirical Comparison of Machine-Learning Methods on Bank Client Credit Assessments

被引:68
作者
Munkhdalai, Lkhagvadorj [1 ]
Munkhdalai, Tsendsuren [2 ]
Namsrai, Oyun-Erdene [3 ]
Lee, Jong Yun [1 ]
Ryu, Keun Ho [4 ]
机构
[1] Chungbuk Natl Univ, Coll Elect & Comp Engn, Database Bioinformat Lab, Cheongju 28644, South Korea
[2] Microsoft Res, Montreal, PQ H3A 3H3, Canada
[3] Natl Univ Mongolia, Dept Informat & Comp Sci, Bldg 3 Room 212, Ulaanbaatar 14201, Mongolia
[4] Ton Duc Thang Univ, Fac Informat Technol, Ho Chi Minh City 700000, Vietnam
基金
新加坡国家研究基金会;
关键词
automated credit scoring; decision making; machine learning; internet bank; sustainability; SUPPORT VECTOR MACHINES; FEATURE-SELECTION; NEURAL-NETWORKS; REGRESSION-ANALYSIS; MODEL; RISK; BENCHMARKING; PROBABILITY; CLASSIFIERS; ALGORITHMS;
D O I
10.3390/su11030699
中图分类号
X [环境科学、安全科学];
学科分类号
08 ; 0830 ;
摘要
Machine learning and artificial intelligence have achieved a human-level performance in many application domains, including image classification, speech recognition and machine translation. However, in the financial domain expert-based credit risk models have still been dominating. Establishing meaningful benchmark and comparisons on machine-learning approaches and human expert-based models is a prerequisite in further introducing novel methods. Therefore, our main goal in this study is to establish a new benchmark using real consumer data and to provide machine-learning approaches that can serve as a baseline on this benchmark. We performed an extensive comparison between the machine-learning approaches and a human expert-based modelFICO credit scoring systemby using a Survey of Consumer Finances (SCF) data. As the SCF data is non-synthetic and consists of a large number of real variables, we applied two variable-selection methods: the first method used hypothesis tests, correlation and random forest-based feature importance measures and the second method was only a random forest-based new approach (NAP), to select the best representative features for effective modelling and to compare them. We then built regression models based on various machine-learning algorithms ranging from logistic regression and support vector machines to an ensemble of gradient boosted trees and deep neural networks. Our results demonstrated that if lending institutions in the 2001s had used their own credit scoring model constructed by machine-learning methods explored in this study, their expected credit losses would have been lower, and they would be more sustainable. In addition, the deep neural networks and XGBoost algorithms trained on the subset selected by NAP achieve the highest area under the curve (AUC) and accuracy, respectively.
引用
收藏
页数:23
相关论文
共 80 条
  • [1] Neural nets versus conventional techniques in credit scoring in Egyptian banking
    Abdou, Hussein
    Pointon, John
    El-Masry, Ahmed
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2008, 35 (03) : 1275 - 1292
  • [2] Classifiers consensus system approach for credit scoring
    Ala'raj, Maher
    Abbod, Maysam F.
    [J]. KNOWLEDGE-BASED SYSTEMS, 2016, 104 : 89 - 105
  • [3] FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY
    ALTMAN, EI
    [J]. JOURNAL OF FINANCE, 1968, 23 (04) : 589 - 609
  • [4] [Anonymous], 2010, Basel III: A global regulatory framework for more resilient banks and banking systems
  • [5] [Anonymous], KERASR R INTERFACE K
  • [6] [Anonymous], REP C CRED SCOR ITS
  • [7] [Anonymous], 2007, Int. Rev. Financ. Anal, DOI [10.1016/j.irfa.2007.06.001, DOI 10.1016/J.IRFA.2007.06.001, DOI 10.1016/J.IRFA.2007.06.001.(CREDIT:ASPECTS]
  • [8] [Anonymous], SYST ENG THEORY PRAC
  • [9] [Anonymous], 2012, ESANN 2012 P 20 EURO
  • [10] [Anonymous], 2013, Package 'FSelector