Machine Learning for an Enhanced Credit Risk Analysis: A Comparative Study of Loan Approval Prediction Models Integrating Mental Health Data

被引:0
作者
Alagic, Adnan [1 ]
Zivic, Natasa [2 ]
Kadusic, Esad [3 ]
Hamzic, Dzenan [1 ]
Hadzajlic, Narcisa [1 ]
Dizdarevic, Mejra [1 ]
Selmanovic, Elmedin [4 ]
机构
[1] Univ Zenica, Polytech Fac, Zenica 72000, Bosnia & Herceg
[2] Leipzig Univ Appl Sci, Fac Digital Transformat FDIT, D-04277 Leipzig, Germany
[3] Univ Sarajevo, Fac Educ Sci, Sarajevo 71000, Bosnia & Herceg
[4] Univ Sarajevo, Fac Sci, Sarajevo 71000, Bosnia & Herceg
来源
MACHINE LEARNING AND KNOWLEDGE EXTRACTION | 2024年 / 6卷 / 01期
关键词
machine learning; prediction; supervised learning; classification; business intelligence; boosting algorithms; credit risk; loan approval;
D O I
10.3390/make6010004
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The number of loan requests is rapidly growing worldwide representing a multi-billion-dollar business in the credit approval industry. Large data volumes extracted from the banking transactions that represent customers' behavior are available, but processing loan applications is a complex and time-consuming task for banking institutions. In 2022, over 20 million Americans had open loans, totaling USD 178 billion in debt, although over 20% of loan applications were rejected. Numerous statistical methods have been deployed to estimate loan risks opening the field to estimate whether machine learning techniques can better predict the potential risks. To study the machine learning paradigm in this sector, the mental health dataset and loan approval dataset presenting survey results from 1991 individuals are used as inputs to experiment with the credit risk prediction ability of the chosen machine learning algorithms. Giving a comprehensive comparative analysis, this paper shows how the chosen machine learning algorithms can distinguish between normal and risky loan customers who might never pay their debts back. The results from the tested algorithms show that XGBoost achieves the highest accuracy of 84% in the first dataset, surpassing gradient boost (83%) and KNN (83%). In the second dataset, random forest achieved the highest accuracy of 85%, followed by decision tree and KNN with 83%. Alongside accuracy, the precision, recall, and overall performance of the algorithms were tested and a confusion matrix analysis was performed producing numerical results that emphasized the superior performance of XGBoost and random forest in the classification tasks in the first dataset, and XGBoost and decision tree in the second dataset. Researchers and practitioners can rely on these findings to form their model selection process and enhance the accuracy and precision of their classification models.
引用
收藏
页码:53 / 77
页数:25
相关论文
共 32 条
  • [1] Forecasting nonperforming loans using machine learning
    Abdullah, Mohammad
    Chowdhury, Mohammad Ashraful Ferdous
    Uddin, Ajim
    Moudud-Ul-Huq, Syed
    [J]. JOURNAL OF FORECASTING, 2023, 42 (07) : 1664 - 1689
  • [2] Almheiri A.S., 2023, Automated Loan Approval System for Banks
  • [3] Alsaleem M.Y.E., 2020, AL-Rafidain Journal of Computer Sciences and Mathematics, V14, P159, DOI [10.33899/csmj.2020.164686, DOI 10.33899/CSMJ.2020.164686]
  • [4] [Anonymous], 2021, STATE MENTAL HLTH AM
  • [5] [Anonymous], 2021, National Alliance on Mental Illness
  • [6] Banco de Espana, 2017, Eurosistema Report on the Financial and Banking Crisis in Spain, 2008-2014
  • [7] Bhargav P., 2023, A Machine Learning Method for Predicting Loan Approval by Comparing the Random Forest and Decision Tree Algorithms
  • [8] Bhutta Neil, How Much Does Racial Bias Affect Mortgage Lending? Evidence from Human and Algorithmic Credit Decisions
  • [9] BREIMAN L, 1984, CLASSIFICATION REGRE, DOI DOI 10.1201/9781315139470
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794