Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework

被引:0
作者
Gao, Ruize [1 ,2 ]
Cui, Shaoze [3 ]
Wang, Yu [4 ]
Xu, Wei [5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Beijing Inst Math Sci & Applicat, Beijing, Peoples R China
[3] Beijing Inst Technol, Beijing, Peoples R China
[4] Chongqing Univ, Chongqing, Peoples R China
[5] Jiangnan Univ, Wuxi, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Financial distress prediction; Feature selection; Imbalanced data; Ensemble learning; Particle swarm optimization; BUSINESS FAILURE PREDICTION; SUPPORT VECTOR MACHINE; FEATURE-SELECTION; DISCRIMINANT-ANALYSIS; NEURAL-NETWORKS; OPTIMIZATION; INFORMATION; RATIOS; FILTER; MODEL;
D O I
10.1186/s40854-024-00745-w
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model's ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE-a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model's parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company's information and growth potential.
引用
收藏
页数:34
相关论文
共 75 条
[1]   FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND PREDICTION OF CORPORATE BANKRUPTCY [J].
ALTMAN, EI .
JOURNAL OF FINANCE, 1968, 23 (04) :589-609
[2]   CatBoost model and artificial intelligence techniques for corporate failure prediction [J].
Ben Jabeur, Sami ;
Gharib, Cheima ;
Mefteh-Wali, Salma ;
Ben Arfi, Wissal .
TECHNOLOGICAL FORECASTING AND SOCIAL CHANGE, 2021, 166
[3]   Gradient-based optimization of hyperparameters [J].
Bengio, Y .
NEURAL COMPUTATION, 2000, 12 (08) :1889-1900
[4]  
Bergstra J, 2012, J MACH LEARN RES, V13, P281
[5]  
Bossomaier T, 2016, Transfer entropy, P65
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]   A two-stage Bayesian network model for corporate bankruptcy prediction [J].
Cao, Yi ;
Liu, Xiaoquan ;
Zhai, Jia ;
Hua, Shan .
INTERNATIONAL JOURNAL OF FINANCE & ECONOMICS, 2022, 27 (01) :455-472
[8]   MCELCCh-FDP: Financial distress prediction with classifier ensembles based on firm life cycle and Choquet integral [J].
Cao, Yu .
EXPERT SYSTEMS WITH APPLICATIONS, 2012, 39 (08) :7041-7049
[9]   Predicting financial distress using multimodal data: An attentive and regularized deep learning method [J].
Che, Wanliu ;
Wang, Zhao ;
Jiang, Cuiqing ;
Abedin, Mohammad Zoynul .
INFORMATION PROCESSING & MANAGEMENT, 2024, 61 (04)
[10]   A hybrid ANFIS model for business failure prediction utilizing particle swarm optimization and subtractive clustering [J].
Chen, Mu-Yen .
INFORMATION SCIENCES, 2013, 220 :180-195