Dynamic Financial Distress Prediction Using Combined LASSO and GBDT Algorithms

被引:0
作者
Jiao, Ziyi [1 ]
机构
[1] School of Economics and Management, Henan Polytechnic Institute, Nanyang
来源
Informatica (Slovenia) | 2024年 / 48卷 / 17期
关键词
concept drift; financial distress forecasting; gbdt algorithm; lasso algorithm; similarity weighting;
D O I
10.31449/inf.v48i17.6493
中图分类号
学科分类号
摘要
With the global economy in a downward cycle under the influence of the epidemic, companies are facing a crisis in their business and financial conditions, and most companies are more likely to be in financial distress in a poor economic environment. The existence of concept drift problem makes the actual prediction of financial distress prediction poor or can only solve limited types of concept drift. Most existing research on financial distress prediction methods use machine learning methods, such as random forests, but there are limitations in dealing with concept drift problems, such as difficulty in model updating and data imbalance. Therefore, a study proposes a model that combines the minimum absolute shrinkage and selection operator with gradient boosting tree algorithm to solve the problem of dynamic concept drift and accurately predict the financial difficulties of enterprises. The study selected financial datasets from Chinese A-share listed companies from 2019 to 2022, with selection criteria including but not limited to the company's market value, industry representativeness, and financial information. In order to reduce potential sample bias caused by market structure changes, policy adjustments, and other factors, the study adopts time series and industry stratified sampling methods to ensure the representativeness of the samples. Firstly, conduct a thorough analysis of the two algorithms and apply them to dynamic financial indicator selection in financial samples. Secondly, a comprehensive prediction model is established using the sample similarity index. Experiments compare the performance of the model with a variety of basic classifiers, including random forests, support vector machines, naive Bayes, logistic regression, single decision trees, and ordinary feedforward neural networks. The results show that the accuracy of the model in dynamic environment is 92.47% and 92.31%, F value is 85.33% and 85.12%, G value is 91.78%, 91.65% and 91.92%. The gradient lifting tree classifier performs best in accuracy, F-value and G-value, with an average increase of 0.051 accuracy and 0.07 F-value, while the performance of G-value is stable but not significantly different. Through Wilcoxon test, it is found that similarity weighting significantly improves the prediction effectiveness of most classifiers. The study achieved effective processing of dynamic concept drift for the first time by combining two algorithms and using sample similarity index. © 2024 Slovene Society Informatika. All rights reserved.
引用
收藏
页码:139 / 152
页数:13
相关论文
共 23 条
[1]  
Kuerten B G, Samuel B, Bonner M J, Ayuku D O, Njuguna F, Taylor S M, Puffer E S., Psychosocial burden of childhood sickle cell disease on caregivers in Kenya, Journal of Pediatric Psychology, 45, 5, pp. 561-572, (2020)
[2]  
Cuesta-Gonzalez M, Paredes-Gazquez J, Ruza C, Fernandez-Olit B., The relationship between vulnerable financial consumers and banking institutions. A qualitative study in Spain, Geoforum, 119, 3, pp. 163-176, (2021)
[3]  
Lavikainen P, Aarnio E, Niskanen L, Mantyselka P, Martikainen J., Short-term impact of co-payment level increase on the use of medication and patient-reported outcomes in Finnish patients with type 2 diabetes, Health Policy, 124, 12, pp. 1310-1316, (2020)
[4]  
Ohishi M, Fukui K, Okamura K, Itoh Y, Yanagiharaa H., Coordinate optimization for generalized fused Lasso, Communications in Statistics-Theory and Methods, 50, 24, pp. 5955-5973, (2021)
[5]  
Luo S, Zhao W, Pan L., Online GBDT with chunk dynamic weighted majority learners for noisy and drifting data streams, Neural Processing Letters, 53, 5, pp. 3783-3799, (2021)
[6]  
Kang J, Choi Y J, Kim I, Lee H, Kim H S, Baik S H, Kim N K, Lee K Y., LASSO-based machine learning algorithm for prediction of lymph node metastasis in T1 colorectal cancer. Cancer Research and Treatment: Official, Journal of Korean Cancer Association, 53, 3, pp. 773-783, (2021)
[7]  
Motamedi F, Perez-Sanchez H, Mehridehnavi A, Fassihi A, Ghasemi F., Accelerating big data analysis through LASSO-random forest algorithm in QSAR studies, Bioinformatics, 38, 2, pp. 469-475, (2022)
[8]  
Jiang C, Jiang W., Lasso algorithm and support vector machine strategy to screen pulmonary arterial hypertension gene diagnostic markers, Scottish Medical Journal, 68, 1, pp. 21-31, (2023)
[9]  
Miswan N H, Chan C S, Ng C G., Hospital readmission prediction based on improved feature selection using grey relational analysis and LASSO, Grey Systems: Theory and Application, 11, 4, pp. 796-812, (2021)
[10]  
Arumugam P, Kuppan V., A GBDT-SOA approach for the system modelling of optimal energy management in grid-connected micro -grid system, International Journal of Energy Research, 45, 5, pp. 6765-6783, (2021)