Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

被引:0
|
作者
Akinjole, Abisola [1 ]
Shobayo, Olamilekan [1 ]
Popoola, Jumoke [1 ]
Okoyeigbo, Obinna [2 ]
Ogunleye, Bayode [3 ]
机构
[1] Sheffield Hallam Univ, Dept Comp, Sheffield S1 2NU, England
[2] Edge Hill Univ, Dept Psychol, Ormskirk L39 4QP, England
[3] Univ Brighton, Dept Comp & Math, Brighton BN2 4GJ, England
关键词
credit default prediction; deep learning; ensemble learning; machine learning; CREDIT; NETWORK; TREES; SMOTE;
D O I
10.3390/math12213423
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] An efficient ensemble-based Machine Learning for breast cancer detection
    Kapila, Ramdas
    Saleti, Sumalatha
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2023, 86
  • [22] Loan Repayment Prediction Using Logistic Regression Ensemble Learning With Machine Learning Algorithms
    Dinh, Thuan Nguyen
    Thanh, Binh Pham
    2022 9TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE, ISCMI, 2022, : 79 - 85
  • [23] Impact of mortgage soft information in loan pricing on default prediction using machine learning
    Luong, Thi Mai
    Scheule, Harald
    Wanzare, Nitya
    INTERNATIONAL REVIEW OF FINANCE, 2023, 23 (01) : 158 - 186
  • [24] Ensemble-Based Deep Learning Model for Network Traffic Classification
    Aouedi, Ons
    Piamrat, Kandaraj
    Parrein, Benoit
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2022, 19 (04): : 4124 - 4135
  • [25] Ensemble-Based Machine Learning Algorithms for Classifying Breast Tissue Based on Electrical Impedance Spectroscopy
    Rahman, Sam Matiur
    Ali, Md Asraf
    Altwijri, Omar
    Alqahtani, Mahdi
    Ahmed, Nasim
    Ahamed, Nizam U.
    ADVANCES IN ARTIFICIAL INTELLIGENCE, SOFTWARE AND SYSTEMS ENGINEERING, 2020, 965 : 260 - 266
  • [26] Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts
    Jonsson, Leif
    Borg, Markus
    Broman, David
    Sandahl, Kristian
    Eldh, Sigrid
    Runeson, Per
    EMPIRICAL SOFTWARE ENGINEERING, 2016, 21 (04) : 1533 - 1578
  • [27] Fatigue Life Prediction of GLARE Composites Using Regression Tree Ensemble-Based Machine Learning Model
    Sai, Wei
    Chai, Gin Boay
    Srikanth, Narasimalu
    ADVANCED THEORY AND SIMULATIONS, 2020, 3 (06)
  • [28] Automated bug assignment: Ensemble-based machine learning in large scale industrial contexts
    Leif Jonsson
    Markus Borg
    David Broman
    Kristian Sandahl
    Sigrid Eldh
    Per Runeson
    Empirical Software Engineering, 2016, 21 : 1533 - 1578
  • [29] A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm
    Xue, Xiaowei
    Yao, Min
    Wu, Zhaohui
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 57 (02) : 389 - 412
  • [30] Modeling Consumer Loan Default Prediction Using Ensemble Neural Networks
    Hassan, Amira Kamil Ibrahim
    Abraham, Ajith
    2013 INTERNATIONAL CONFERENCE ON COMPUTING, ELECTRICAL AND ELECTRONICS ENGINEERING (ICCEEE), 2013, : 719 - +