A method of credit evaluation modeling based on block-wise missing data

被引:0
作者
Qiujun Lan
Shan Jiang
机构
[1] Business School of Hunan University,
[2] Hunan Key Laboratory of Data Science and Blockchain,undefined
来源
Applied Intelligence | 2021年 / 51卷
关键词
Missing data; Credit evaluation; Data mining; Multi-task learning;
D O I
暂无
中图分类号
学科分类号
摘要
Missing data is a common problem in credit evaluation practice and can obstruct the development and application of an evaluation model. Block-wise missing data is a particularly troublesome issue. Based on multi-task feature selection approach, this paper proposes a method called MMPFS to build a model for credit evaluation that primarily includes two steps: (1) dividing the dataset into several nonoverlapping subsets based on missing patterns, and (2) integrating the multi-task feature selection approach using logistic regression to perform joint feature learning on all subsets. The proposed method has the following advantages: (1) missing data do not need to be managed in advance, (2) available data can be fully used for model learning, (3) information loss or bias caused by general missing data processing methods can be avoided, and (4) overfitting risk caused by redundant features can be reduced. The implementation framework and algorithm principle of the proposed method are described, and three credit datasets from UCI are investigated to compare the proposed method with other commonly used missing data treatments. The results show that MMPFS can produce a better credit evaluation model than data preprocessing methods, such as sample deletion and data imputation.
引用
收藏
页码:6859 / 6880
页数:21
相关论文
共 122 条
[1]  
Sung HH(2012)Predicting repayment of the credit card debt Comput Oper Res 39 765-773
[2]  
Krishnan R(2015)A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring J Retail Consum Serv 27 11-23
[3]  
Koutanaei FN(2018)Comparison of several data mining methods in credit card default prediction Intell Inf Manag 10 115-122
[4]  
Sajedi H(2018)Research on bank credit default prediction based on data mining algorithm The International Journal of Social Sciences and Humanities Invention 5 4820-4823
[5]  
Khanbabaei M(2007)Credit scoring with a data mining approach based on support vector machines Expert Syst Appl 33 847-856
[6]  
Yang S(2012)Multi-source feature learning for joint analysis of incomplete multiple heterogeneous neuroimaging data Neuroimage 61 622-632
[7]  
Zhang H(2019)Data-driven missing data imputation in cluster monitoring system based on deep neural network Appl Intell 50 860-877
[8]  
Ying L(2020)Multivariable data imputation for the analysis of incomplete credit data Expert Syst Appl 141 1-12
[9]  
Huang C-L(2017)An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers Expert Syst Appl 89 52-65
[10]  
Chen M-C(2010)Selection-fusion approach for classification of datasets with missing values Pattern Recogn 43 2340-2350