A machine learning-based credit risk prediction engine system using a stacked classifier and a filter-based feature selection method

被引：18

作者：

Emmanuel, Ileberi ^{[1
]}

Sun, Yanxia ^{[1
]}

Wang, Zenghui ^{[2
]}

机构：

[1] Univ Johannesburg, Dept Elect & Elect Engn Sci, Johannesburg, South Africa

[2] Univ South Africa, Dept Elect Engn, Johannesburg, South Africa

来源：

JOURNAL OF BIG DATA | 2024年 / 11卷 / 01期

基金：

新加坡国家研究基金会;

关键词：

Machine learning; Credit risk; Feature selection; KNN;

D O I：

10.1186/s40537-024-00882-0

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Credit risk prediction is a crucial task for financial institutions. The technological advancements in machine learning, coupled with the availability of data and computing power, has given rise to more credit risk prediction models in financial institutions. In this paper, we propose a stacked classifier approach coupled with a filter-based feature selection (FS) technique to achieve efficient credit risk prediction using multiple datasets. The proposed stacked model includes the following base estimators: Random Forest (RF), Gradient Boosting (GB), and Extreme Gradient Boosting (XGB). Furthermore, the estimators in the Stacked architecture were linked sequentially to extract the best performance. The filter- based FS method that is used in this research is based on information gain (IG) theory. The proposed algorithm was evaluated using the accuracy, the F1-Score and the Area Under the Curve (AUC). Furthermore, the Stacked algorithm was compared to the following methods: Artificial Neural Network (ANN), Decision Tree (DT), and k-Nearest Neighbour (KNN). The experimental results show that stacked model obtained AUCs of 0.934, 0.944 and 0.870 on the Australian, German and Taiwan datasets, respectively. These results, in conjunction with the accuracy and F1-score metrics, demonstrated that the proposed stacked classifier outperforms the individual estimators and other existing methods.

引用

页数：14

共 38 条

[1]

[Anonymous], GOOGLE COLAB

[2]

[Anonymous], 2010, UCI Machine Learning Repository

[3]

[Anonymous], 2011, Scikit-Learn: Machine Learning in Python

[4] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[5]

Chakrabarty Navoneel, 2019, Emerging Technologies in Data Mining and Information Security. Proceedings of IEMIS 2018. Advances in Intelligent Systems and Computing (AISC 813), P651, DOI 10.1007/978-981-13-1498-8_57

[6]

Lipton ZC, 2014, Arxiv, DOI [arXiv:1402.1892, DOI 10.48550/ARXIV.1402.1892]

[7] Improving protein-protein interactions prediction accuracy using XGBoost feature selection and stacked ensemble classifier [J].

Chen, Cheng ;

Zhang, Qingmei ;

Yu, Bin ;

Yu, Zhaomin ;

Lawrence, Patrick J. ;

Ma, Qin ;

Zhang, Yan .

COMPUTERS IN BIOLOGY AND MEDICINE, 2020, 123

[8] Generative Adversarial Networks An overview [J].

Creswell, Antonia ;

White, Tom ;

Dumoulin, Vincent ;

Arulkumaran, Kai ;

Sengupta, Biswa ;

Bharath, Anil A. .

IEEE SIGNAL PROCESSING MAGAZINE, 2018, 35 (01) :53-65

[9] Credit Card Fraud Detection using Machine Learning Algorithms [J].

Dornadula, Vaishnavi Nath ;

Geetha, S. .

2ND INTERNATIONAL CONFERENCE ON RECENT TRENDS IN ADVANCED COMPUTING ICRTAC -DISRUP - TIV INNOVATION , 2019, 2019, 165 :631-641

[10]

Edmond C., 2020, Int J, V2020, P5

← 1 2 3 4 →