Ensemble-Based Machine Learning Algorithm for Loan Default Risk Prediction

被引：0

作者：

Akinjole, Abisola ^{[1
]}

Shobayo, Olamilekan ^{[1
]}

Popoola, Jumoke ^{[1
]}

Okoyeigbo, Obinna ^{[2
]}

Ogunleye, Bayode ^{[3
]}

机构：

[1] Sheffield Hallam Univ, Dept Comp, Sheffield S1 2NU, England

[2] Edge Hill Univ, Dept Psychol, Ormskirk L39 4QP, England

[3] Univ Brighton, Dept Comp & Math, Brighton BN2 4GJ, England

来源：

MATHEMATICS | 2024年 / 12卷 / 21期

关键词：

credit default prediction; deep learning; ensemble learning; machine learning; CREDIT; NETWORK; TREES; SMOTE;

D O I：

10.3390/math12213423

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Predicting credit default risk is important to financial institutions, as accurately predicting the likelihood of a borrower defaulting on their loans will help to reduce financial losses, thereby maintaining profitability and stability. Although machine learning models have been used in assessing large applications with complex attributes for these predictions, there is still a need to identify the most effective techniques for the model development process, including the technique to address the issue of data imbalance. In this research, we conducted a comparative analysis of random forest, decision tree, SVMs (Support Vector Machines), XGBoost (Extreme Gradient Boosting), ADABoost (Adaptive Boosting) and the multi-layered perceptron, to predict credit defaults using loan data from LendingClub. Additionally, XGBoost was used as a framework for testing and evaluating various techniques. Moreover, we applied this XGBoost framework to handle the issue of class imbalance observed, by testing various resampling methods such as Random Over-Sampling (ROS), the Synthetic Minority Over-Sampling Technique (SMOTE), Adaptive Synthetic Sampling (ADASYN), Random Under-Sampling (RUS), and hybrid approaches like the SMOTE with Tomek Links and the SMOTE with Edited Nearest Neighbours (SMOTE + ENNs). The results showed that balanced datasets significantly outperformed the imbalanced dataset, with the SMOTE + ENNs delivering the best overall performance, achieving an accuracy of 90.49%, a precision of 94.61% and a recall of 92.02%. Furthermore, ensemble methods such as voting and stacking were employed to enhance performance further. Our proposed model achieved an accuracy of 93.7%, a precision of 95.6% and a recall of 95.5%, which shows the potential of ensemble methods in improving credit default predictions and can provide lending platforms with the tool to reduce default rates and financial losses. In conclusion, the findings from this study have broader implications for financial institutions, offering a robust approach to risk assessment beyond the LendingClub dataset.

引用

页数：31

共 50 条

[31] A novel ensemble-based wrapper method for feature selection using extreme learning machine and genetic algorithm
Xiaowei Xue
Min Yao
Zhaohui Wu
Knowledge and Information Systems, 2018, 57 : 389 - 412
[32] A novel hybrid ensemble model based on tree-based method and deep learning method for default prediction
He, Hongliang
Fan, Yanli
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 176
[33] Dynamic ensemble-based machine learning models for predicting pest populations
Singh, Ankit Kumar
Yeasin, Md
Paul, Ranjit Kumar
Paul, A. K.
Sarkar, Anita
FRONTIERS IN APPLIED MATHEMATICS AND STATISTICS, 2024, 10
[34] An Ensemble-based Supervised Machine Learning Framework for Android Ransomware Detection
Sharma, Shweta
Challa, Rama Krishna
Kumar, Rakesh
INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2021, 18 (3A) : 422 - 429
[35] An Ensemble-Based Machine Learning for Predicting Fraud of Credit Card Transactions
Baabdullah, Tahani
Rawat, Danda B.
Liu, Chunmei
Alzahrani, Amani
INTELLIGENT COMPUTING, VOL 2, 2022, 507 : 214 - 229
[36] Loan Default Risk Prediction Using Knowledge Graph
Alam, Md Nurul
Ali, Muhammad Masroor
2022-14TH INTERNATIONAL CONFERENCE ON KNOWLEDGE AND SMART TECHNOLOGY (KST 2022), 2022, : 34 - 39
[37] Ensemble Learning or Deep Learning? Application to Default Risk Analysis
Hamori, Shigeyuki
Kawai, Minami
Kume, Takahiro
Murakami, Yuji
Watanabe, Chikara
JOURNAL OF RISK AND FINANCIAL MANAGEMENT, 2018, 11 (01)
[38] Fraud prediction in loan default using support vector machine
Eweoya, I. O.
Adebiyi, A. A.
Azeta, A. A.
Amosu, Olufunmilola
3RD INTERNATIONAL CONFERENCE ON SCIENCE AND SUSTAINABLE DEVELOPMENT (ICSSD 2019): SCIENCE, TECHNOLOGY AND RESEARCH: KEYS TO SUSTAINABLE DEVELOPMENT, 2019, 1299
[39] Enhancing Machine Learning based QoE Prediction by Ensemble Models
Casas, Pedro
Seufert, Michael
Wehner, Nikolas
Schwind, Anika
Wamser, Florian
2018 IEEE 38TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS), 2018, : 1642 - 1647
[40] Integrating oversampling and ensemble-based machine learning techniques for an imbalanced dataset in dyslexia screening tests
Kaisar, Shahriar
Chowdhury, Abdullahi
ICT EXPRESS, 2022, 8 (04): : 563 - 568

← 1 2 3 4 5 →