The extraction of early warning features for predicting financial distress based on XGBoost model and shap framework

被引:7
作者
Yang, He [1 ]
Li, Emma [2 ]
Cai, Yi Fang [1 ]
Li, Jiapei [3 ]
Yuan, George X. [4 ,5 ,6 ,7 ]
机构
[1] Zhengzhou Univ, Sch Math & Stats, Zhengzhou 450001, Peoples R China
[2] Henan Expt High Sch, Zhengzhou 450001, Peoples R China
[3] Zhengzhou Univ, Henan Key Lab Financial Engn, Zhengzhou 450001, Peoples R China
[4] Guangxi Univ, Business Sch, Nanning 530004, Peoples R China
[5] Sun Yat Sen Univ, Business Sch, Guangzhou 510275, Peoples R China
[6] Chengdu Univ, Business Sch, Chengdu 610106, Peoples R China
[7] BBD Technol Co Ltd, 966 9 Bldg,Tianfu Ave, Chengdu 610093, Peoples R China
基金
中国国家自然科学基金;
关键词
Financial distress; early-warning feature; XGBoost; SHAP framework; AUC and KS testing; machine learning; RATIOS;
D O I
10.1142/S2424786321410048
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
The purpose of this paper is to establish a framework for the extraction of early warning risk features for the predicting financial distress based on XGBoost model and SHAP. It is well known that the way to construct early warning risk features to predict financial distress of companies is very important, and by comparing with the traditional statistical methods, though the data-driven machine learning for the financial early warning, modelling has a better performance in terms of prediction accuracy, but it also brings the difficulty such as the one the corresponding model may be not explained well. Recently, eXtreme Gradient Boosting (XGBoost), an ensemble learning algorithm based on extreme gradient boosting, has become a hot topic in the area of machine learning research field due to its strong nonlinear information recognition ability and high prediction accuracy in the practice. In this study, the XGBoost algorithm is used to extract early warning features for the predicting financial distress for listed companies, with 76 financial risk features from seven categories of aspects, and 14 non-financial risk features from four categories of aspects, which are collected to establish an early warning system for the predication of financial distress. With applications, we conduct the empirical testing respect to AUC, KS and Kappa, the numerical results show that by comparing with the Logistic model, our method based on XGBoost model established in this paper has much better ability to predict the financial distress risk of listed companies. Moreover, under the framework of SHAP (SHAPley Additive exPlanations), we are able to give a reasonable explanation for important risk features and influencing ways affecting the financial distress visibly. The results given by this paper show that the XGBoost approach to model early warning features for financial distress does not only preform a better prediction accuracy, but also is explainable, which is significant for the identification of early warning to the financial distress risk for listed companies in the practice.
引用
收藏
页数:24
相关论文
共 23 条