Hybrid Dual-Resampling and Cost-Sensitive Classification for Credit Risk Prediction

被引:0
作者
Osei-Brefo, Emmanuel [1 ]
Mitchell, Richard [1 ]
Hong, Xia [1 ]
机构
[1] Univ Reading, Reading, England
来源
ARTIFICIAL INTELLIGENCE XL, AI 2023 | 2023年 / 14381卷
关键词
Class imbalance; Credit Risk Modelling; Gaussian Mixture Modelling; Logistic Regression; Cost-Sensitive Learning; VALIDATION;
D O I
10.1007/978-3-031-47994-6_32
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The class imbalance in financial data sets is prevalent and problematic when evaluating credit risks. This paper proposes a Hybrid dual Resampling and cost-sensitive classification approach by creating heuristically balanced data sets. Given an imbalanced credit data set, a synthetic minority class is generated using a resampling learning technique based on Gaussian mixture modelling from the minority class data. Simultaneously, k-means clustering is applied to the majority class. Then, feature selection is performed using an Extra Tree Ensemble technique. Finally, a cost-sensitive logistic model is estimated and applied to predict the probability of default using the heuristically balanced datasets. The results show that the proposed technique achieves superior performance in comparison with other imbalanced preprocessing approaches.
引用
收藏
页码:350 / 362
页数:13
相关论文
共 23 条
[1]   Applying support vector machines to imbalanced datasets [J].
Akbani, R ;
Kwek, S ;
Japkowicz, N .
MACHINE LEARNING: ECML 2004, PROCEEDINGS, 2004, 3201 :39-50
[2]  
APOSTOLIK Richard., 2009, Foundation of Banking Risk An Overview of Banking, Banking Risks, and Risked-Based Banking Regulations
[3]   A Novel Extra Tree Ensemble Optimized DL Framework (ETEODL) for Early Detection of Diabetes [J].
Arya, Monika ;
Sastry, Hanumat. G. ;
Motwani, Anand ;
Kumar, Sunil ;
Zaguia, Atef .
FRONTIERS IN PUBLIC HEALTH, 2022, 9
[4]  
Biprodip P., 2017, 2017 INT C EL COMP C
[5]  
Bishop C M., 2006, Pattern recognition and machine learning, Vvol 4
[6]  
Chawla N.V., 2004, ACM SIGKDD EXPLOR NE, V6, P1, DOI [DOI 10.1145/1007730.1007733, 10.1145/1007730.1007733]
[7]   SMOTE: Synthetic minority over-sampling technique [J].
Chawla, Nitesh V. ;
Bowyer, Kevin W. ;
Hall, Lawrence O. ;
Kegelmeyer, W. Philip .
2002, American Association for Artificial Intelligence (16)
[8]  
Crouhy M., 2014, The essentials of risk management, V2nd
[9]   DeepHAR: a deep feed-forward neural network algorithm for smart insole-based human activity recognition [J].
D'Arco, Luigi ;
Wang, Haiying ;
Zheng, Huiru .
NEURAL COMPUTING & APPLICATIONS, 2023, 35 (18) :13547-13563
[10]  
Ershadi MJ, 2018, QUAL-ACCESS SUCCESS, V19, P59