Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

被引:0
|
作者
Akinjole, Abisola [1 ]
Shobayo, Olamilekan [1 ]
Popoola, Jumoke [1 ]
Okoyeigbo, Obinna [2 ]
Ogunleye, Bayode [3 ]
机构
[1] Sheffield Hallam Univ, Dept Comp, Sheffield S1 2NU, England
[2] Edge Hill Univ, Dept Psychol, Ormskirk L39 4QP, England
[3] Univ Brighton, Dept Comp & Math, Brighton BN2 4GJ, England
关键词
credit default prediction; deep learning; ensemble learning; machine learning; CREDIT; NETWORK; TREES; SMOTE;
D O I
10.3390/math12213423
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.
引用
收藏
页数:31
相关论文
共 50 条
  • [41] Deep Anomaly Detection with Ensemble-Based Active Learning
    Tang, Xuning
    Astle, Yihua Shi
    Freeman, Craig
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 1663 - 1670
  • [42] An Ensemble-based Active Learning for Breast Cancer Classification
    Lee, Sanghoon
    Amgad, Mohamed
    Masoud, Mohamed
    Subramanian, Rajasekaran
    Gutman, David
    Cooper, Lee
    2019 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2019, : 2549 - 2553
  • [43] DEML: Drug Synergy and Interaction Prediction Using Ensemble-Based Multi-Task Learning
    Wang, Zhongming
    Dong, Jiahui
    Wu, Lianlian
    Dai, Chong
    Wang, Jing
    Wen, Yuqi
    Zhang, Yixin
    Yang, Xiaoxi
    He, Song
    Bo, Xiaochen
    MOLECULES, 2023, 28 (02):
  • [44] Ensemble-based machine learning approach for improved leak detection in water mains
    Ravichandran, Thambirajah
    Gavahi, Keyhan
    Ponnambalam, Kumaraswamy
    Burtea, Valentin
    Mousavi, S. Jamshid
    JOURNAL OF HYDROINFORMATICS, 2021, 23 (02) : 307 - 323
  • [45] Comparing Performance of Machine Learning Algorithms for Default Risk Prediction in Peer to Peer Lending
    Aleksandrova, Yanka
    TEM JOURNAL-TECHNOLOGY EDUCATION MANAGEMENT INFORMATICS, 2021, 10 (01): : 133 - 143
  • [46] Mapping wildfire ignition probability and predictor sensitivity with ensemble-based machine learning
    Qi Tong
    Thomas Gernay
    Natural Hazards, 2023, 119 (3) : 1551 - 1582
  • [47] Mapping wildfire ignition probability and predictor sensitivity with ensemble-based machine learning
    Tong, Qi
    Gernay, Thomas
    NATURAL HAZARDS, 2023, 119 (03) : 1551 - 1582
  • [48] Loan Default Prediction with Deep Learning and Muddling Label Regularization
    Jiang, Weiwei
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2022, E105D (07) : 1340 - 1342
  • [49] Accurate prediction of essential proteins using ensemble machine learning
    Lu, Dezhi
    Wu, Hao
    Hou, Yutong
    Wu, Yuncheng
    Liu, Yuanyuan
    Wang, Jinwu
    CHINESE PHYSICS B, 2025, 34 (01)
  • [50] Prediction of embankments dam break peak outflow: a comparison between empirical equations and ensemble-based machine learning algorithms
    Khosravi, Khabat
    Khozani, ZohrehSheikh
    Hatamiafkoueieh, Javad
    NATURAL HAZARDS, 2023, 118 (03) : 1989 - 2018