Credit Scoring of Small and Micro Enterprises Based on Sample-Dependent Cost Matrix

被引:0
|
作者
Zhang T. [1 ,2 ]
Wang Y. [1 ]
Li K. [1 ]
Zhang Y. [3 ,4 ]
机构
[1] School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai
[2] Shanghai Key Laboratory of Financial Information Technology, Shanghai University of Finance and Economics, Shanghai
[3] School of Computer Science, Fudan University, Shanghai
[4] Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, Shanghai
来源
Tongji Daxue Xuebao/Journal of Tongji University | 2020年 / 48卷 / 01期
关键词
Cost sensitive learning; Credit scoring; Minimum Bayes risk; Sample-dependent; XGBoost model;
D O I
10.11908/j.issn.0253-374x.19017
中图分类号
学科分类号
摘要
Because the credit history data of small and micro enterprises are small and the problem of class imbalance is more serious, this paper proposes a Smote XGboost-Bayes Minimum Risk (SXG-BMR) model based on the sample-dependent cost matrix. The whole sample is oversampled at a low rate to weaken the problem of class imbalance and reduce the risk of model overfitting. The model combines the integrated learning model with the minimum risk Bayes decision to realize the cost sensitivity. At the same time, this paper introduces the sample-dependent cost matrix into the model. The cost matrix is related not only to the category, but also to the attributes of the sample.Therefore,it can characterize the cost more accurately. In the empirical study,this paper uses a standard credit dataset and a real credit dataset of small and micro enterprises in Shanghai. Besides,it compares and analzes of various algorithms. The results show that the SXG-BMR model proposed in this paper has a good performance. © 2020, Editorial Department of Journal of Tongji University. All right reserved.
引用
收藏
页码:149 / 158
页数:9
相关论文
共 33 条
  • [31] Yi B., Zhu J., Li J., Imbalanced data classification on micro-credit company customer credit risk assessment using improved SMOTE support vector machine, Chinese Journal of Management Sciencea, 24, 3, (2016)
  • [32] Feng H., Yao B., Gao Y., Et al., Imbalanced data processing algorithm based on boundary mixed sampling, Control and Decision, 32, 10, (2017)
  • [33] Chen T., Guestrin C., XGBoost: a scalable tree boosting system, Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, (2016)