Predicting financial distress in high-dimensional imbalanced datasets: a multi-heterogeneous self-paced ensemble learning framework

被引:0
作者
Gao, Ruize [1 ,2 ]
Cui, Shaoze [3 ]
Wang, Yu [4 ]
Xu, Wei [5 ]
机构
[1] Tsinghua Univ, Beijing, Peoples R China
[2] Beijing Inst Math Sci & Applicat, Beijing, Peoples R China
[3] Beijing Inst Technol, Beijing, Peoples R China
[4] Chongqing Univ, Chongqing, Peoples R China
[5] Jiangnan Univ, Wuxi, Peoples R China
基金
中国博士后科学基金; 中国国家自然科学基金;
关键词
Financial distress prediction; Feature selection; Imbalanced data; Ensemble learning; Particle swarm optimization; BUSINESS FAILURE PREDICTION; SUPPORT VECTOR MACHINE; FEATURE-SELECTION; DISCRIMINANT-ANALYSIS; NEURAL-NETWORKS; OPTIMIZATION; INFORMATION; RATIOS; FILTER; MODEL;
D O I
10.1186/s40854-024-00745-w
中图分类号
F8 [财政、金融];
学科分类号
0202 ;
摘要
Financial distress prediction (FDP) is a critical area of study for researchers, industry stakeholders, and regulatory authorities. However, FDP tasks present several challenges, including high-dimensional datasets, class imbalances, and the complexity of parameter optimization. These issues often hinder the predictive model's ability to accurately identify companies at high risk of financial distress. To mitigate these challenges, we introduce FinMHSPE-a novel multi-heterogeneous self-paced ensemble (MHSPE) FDP learning framework. The proposed model uses pairwise comparisons of data from multiple time frames combined with the maximum relevance and minimum redundancy method to select an optimal subset of features, effectively resolving the high dimensionality issue. Furthermore, the proposed framework incorporates the MHSPE model to iteratively identify the most informative majority class data samples, effectively addressing the class imbalance issue. To optimize the model's parameters, we leverage the particle swarm optimization algorithm. The robustness of our proposed model is validated through extensive experiments performed on a financial dataset of Chinese listed companies. The empirical results demonstrate that the proposed model outperforms existing competing models in the field of FDP. Specifically, our FinMHSPE framework achieves the highest performance, achieving an area under the curve (AUC) value of 0.9574, considerably surpassing all existing methods. A comparative analysis of AUC values further reveals that FinMHSPE outperforms state-of-the-art approaches that rely on financial features as inputs. Furthermore, our investigation identifies several valuable features for enhancing FDP model performance, notably those associated with a company's information and growth potential.
引用
收藏
页数:34
相关论文
共 75 条
[11]   Predicting corporate financial distress based on integration of decision tree classification and logistic regression [J].
Chen, Mu-Yen .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (09) :11261-11272
[12]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[13]   Using neural networks and data mining techniques for the financial distress prediction model [J].
Chen, Wei-Sen ;
Du, Yin-Kuan .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :4075-4086
[14]   LiFoL: An Efficient Framework for Financial Distress Prediction in High-Dimensional Unbalanced Scenario [J].
Chen, Ying ;
Kuang, Xiaojun ;
Guo, Jifeng .
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (02) :2784-2795
[15]   A quarterly time-series classifier based on a reduced-dimension generated rules method for identifying financial distress [J].
Cheng, Ching-Hsue ;
Wang, Ssu-Hsiang .
QUANTITATIVE FINANCE, 2015, 15 (12) :1979-1994
[16]   Comparing filter and wrapper approaches for feature selection in handwritten character recognition [J].
Cilia, Nicole Dalia ;
D'Alessandro, Tiziana ;
De Stefano, Claudio ;
Fontanella, Francesco ;
Freca, Alessandra Scotto di .
PATTERN RECOGNITION LETTERS, 2023, 168 :39-46
[17]   Instance sampling in credit scoring: An empirical study of sample size and balancing [J].
Crone, Sven F. ;
Finlay, Steven .
INTERNATIONAL JOURNAL OF FORECASTING, 2012, 28 (01) :224-238
[18]   A cluster-based intelligence ensemble learning method for classification problems [J].
Cui, Shaoze ;
Wang, Yanzhang ;
Yin, Yunqiang ;
Cheng, T. C. E. ;
Wang, Dujuan ;
Zhai, Mingyu .
INFORMATION SCIENCES, 2021, 560 :386-409
[19]   A stacking-based ensemble learning method for earthquake casualty prediction [J].
Cui, Shaoze ;
Yin, Yunqiang ;
Wang, Dujuan ;
Li, Zhiwu ;
Wang, Yanzhang .
APPLIED SOFT COMPUTING, 2021, 101 (101)
[20]   An improved support vector machine-based diabetic readmission prediction [J].
Cui, Shaoze ;
Wang, Dujuan ;
Wang, Yanzhang ;
Yu, Pay-Wen ;
Jin, Yaochu .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 166 :123-135