CUS-heterogeneous ensemble-based financial distress prediction for imbalanced dataset with ensemble feature selection

被引:58
作者
Du, Xudong [1 ]
Li, Wei [2 ]
Ruan, Sumei [2 ]
Li, Li [2 ]
机构
[1] Anhui Univ Finance & Econ, Sch Accountancy, Bengbu 233030, Anhui, Peoples R China
[2] Anhui Univ Finance & Econ, Sch Finance, Bengbu 233030, Anhui, Peoples R China
关键词
Financial distress prediction; CUS-GBDT; XGBoost; Heterogeneous ensemble; Feature selection; DISCRIMINANT-ANALYSIS; MACHINE; MODELS; RATIOS;
D O I
10.1016/j.asoc.2020.106758
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the global financial crisis occurred in 2008, with a large amount of companies troubling in financial distress, the machine learning-based prediction of this dilemma has shown economic stakeholders' great practicability. In the field of machine learning, most of the previous studies only focus on the improvement of the imbalanced datasets sampling methods or the introduction of multiple classifiers in the constructing stage for prediction model. In view of this, this paper attempts to improve the scope and depth of ensemble to achieve better prediction performance for a severely imbalanced dataset of financial data of Chinese listed companies. For the first time, this paper combines the clustering-based under-sampling (CUS) with the gradient boosting decision tree (GBDT) to construct the model, which is used along with the current widely used extreme gradient boosting (XGBoost) as heterogeneous classifier to build heterogeneous ensemble in financial distress prediction. In addition, based on the idea of ensemble, this paper uses five feature selection methods based on different theoretical backgrounds to select features, and introduces ensemble from the whole process of feature selection, data preprocessing and model construction. In the comparative experience, the method proposed by us achieves the best performance on the test set. This study demonstrates the broad application of CUS for financial data processing and the superior generalization performance of the ensemble model relative to individual learners. (C) 2020 Elsevier B.V. All rights reserved.
引用
收藏
页数:13
相关论文
共 45 条
  • [1] FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY
    ALTMAN, EI
    [J]. JOURNAL OF FINANCE, 1968, 23 (04) : 589 - 609
  • [2] [Anonymous], 1972, AUD RES MON, P94
  • [3] [Anonymous], 2006, GESTS INT T COMPUT S
  • [4] Machine learning models and bankruptcy prediction
    Barboza, Flavio
    Kimura, Herbert
    Altman, Edward
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2017, 83 : 405 - 417
  • [5] FINANCIAL RATIOS AS PREDICTORS OF FAILURE
    BEAVER, WH
    [J]. JOURNAL OF ACCOUNTING RESEARCH, 1966, 4 : 71 - 111
  • [6] Bryant S. M., 1997, International Journal of Intelligent Systems in Accounting, Finance and Management, V6, P195, DOI 10.1002/(SICI)1099-1174(199709)6:3<195::AID-ISAF132>3.0.CO
  • [7] 2-F
  • [8] CBR-Based Fuzzy Support Vector Machine for Financial Distress Prediction
    Cao, Yu
    Chen, Xiaohong
    [J]. JOURNAL OF TESTING AND EVALUATION, 2013, 41 (05) : 833 - 844
  • [9] Chen T., 2014, NIPS 2014 WORKSH HIG, V42, P69
  • [10] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794