Credit risk prediction with and without weights of evidence using quantitative learning models

被引:3
作者
Seitshiro, Modisane B. [1 ,2 ]
Govender, Seshni [3 ]
机构
[1] North West Univ, Ctr Business Math & Informat, Potchefstroom Campus,Private Bag X6001, ZA-2520 Potchefstroom, South Africa
[2] Natl Inst Theoret & Computat Sci NITheCS, Stellenbosch, South Africa
[3] Univ South Africa, Dept Decis Sci, Pretoria, South Africa
关键词
Credit risk; logistic regression; machine learning; model risk; parameter estimation; probability of default; weights of evidence; optimisation; LOGISTIC-REGRESSION; OPTIMIZATION;
D O I
10.1080/23322039.2024.2338971
中图分类号
F [经济];
学科分类号
02 ;
摘要
The credit risk assessment process is necessary for maintaining financial stability, cost and time efficiency, model performance accuracy, comparability analysis and future business implications in the commercial banking sector. By accurately predicting credit risk, highly regulated banks can make informed lending decisions and minimize potential financial losses. The purpose of this paper is to assess the power of conventional predictive statistical models with and without transforming the features to gain better insights into customer's creditworthiness. The findings of the predicted performance of the logistics regression model are compared to the performance results of machine learning models for credit risk assessment using commercial banking credit registry data. Each model has its strengths and weaknesses, and where one model lacks, another performs better. The article reveals that simpler credit risk assessment techniques delivered outstanding performance while consuming less processing power and have given insights into the most contributing feature categories. Improving a conventional predictive statistical model using some of the feature transformations reduces the overall model performance, specifically for credit registry data. The logistics regression model outperformed all models with the highest F1, accuracy, Jaccard Index and AUC values, respectively.
引用
收藏
页数:19
相关论文
共 44 条
[1]   Effects of Distance Measure Choice on K-Nearest Neighbor Classifier Performance: A Review [J].
Abu Alfeilat, Haneen Arafat ;
Hassanat, Ahmad B. A. ;
Lasassmeh, Omar ;
Tarawneh, Ahmad S. ;
Alhasanat, Mahmoud Bashir ;
Salman, Hamzeh S. Eyal ;
Prasath, V. B. Surya .
BIG DATA, 2019, 7 (04) :221-248
[2]  
Aggarwal A., 2021, INT C MACHINE LEARNI, P120
[3]  
Benedict G, 2021, Arxiv, DOI [arXiv:2108.10566, DOI 10.48550/ARXIV.2108.10566]
[4]  
Beniwal S., 2012, International Journal of Engineering Research Technology (IJERT), V1, P1
[5]  
Bernardo JM., 1994, Bayesian Theory, DOI DOI 10.1002/9780470316870
[6]   Network centrality effects in peer to peer lending [J].
Chen, Xiao ;
Chong, Zhaohui ;
Giudici, Paolo ;
Huang, Bihong .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2022, 600
[7]  
Czepiel ScottA., 2002, Maximum likelihood estimation of logistic regression models: theory and implementation, P83
[8]   COMPARING THE AREAS UNDER 2 OR MORE CORRELATED RECEIVER OPERATING CHARACTERISTIC CURVES - A NONPARAMETRIC APPROACH [J].
DELONG, ER ;
DELONG, DM ;
CLARKEPEARSON, DI .
BIOMETRICS, 1988, 44 (03) :837-845
[9]   Machine learning for credit scoring: Improving logistic regression with non-linear decision-tree effects [J].
Dumitrescu, Elena ;
Hue, Sullivan ;
Hurlin, Christophe ;
Tokpavi, Sessi .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2022, 297 (03) :1178-1192
[10]   Optimization for Medical Image Segmentation: Theory and Practice When Evaluating With Dice Score or Jaccard Index [J].
Eelbode, Tom ;
Bertels, Jeroen ;
Berman, Maxim ;
Vandermeulen, Dirk ;
Maes, Frederik ;
Bisschops, Raf ;
Blaschko, Matthew B. .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (11) :3679-3690