Data Sample Selection Issues for Bankruptcy Prediction

被引:14
作者
Tian, Shaonan [1 ]
Yu, Yan [2 ]
Zhou, Ming [3 ]
机构
[1] San Jose State Univ, Decis Sci, San Jose, CA 95192 USA
[2] Univ Cincinnati, Business Analyt, Cincinnati, OH 45221 USA
[3] San Jose State Univ, Operat & Supply Chain Management, San Jose, CA 95192 USA
来源
RISK HAZARDS & CRISIS IN PUBLIC POLICY | 2015年 / 6卷 / 01期
关键词
bankruptcy forecasting; binary classification; logistic regression; neural networks; support vector machines;
D O I
10.1002/rhc3.12071
中图分类号
C93 [管理学]; D035 [国家行政管理]; D523 [行政管理]; D63 [国家行政管理];
学科分类号
12 ; 1201 ; 1202 ; 120202 ; 1204 ; 120401 ;
摘要
Bankruptcy prediction is of paramount interest to both academics and practitioners. This paper devotes special care to an important aspect of the bankruptcy prediction modeling: Data sample selection issue. To investigate the effect of the different data selection methods, three models are adopted: Logistic regression model, Neural Networks (NNET), and Support Vector Machines (SVM), which have recently gained some popularity in the applications. A Monte Carlo simulation study and an empirical analysis on an updated bankruptcy database are conducted to explore the effect of different data sample selection methods. By comparing the out-of-sample predictive performances, we conclude that if forecasting the probability of bankruptcy is of interest, complete data sampling technique provides more accurate results. However, if a binary bankruptcy decision or classification is desired, choice based sampling technique may still be suitable. In particular, choice-based data samples validated by NNET and SVM can capture more correct predictions of bankruptcy observations, and provide lower asymmetric misclassification rate. In addition, for different choice-based data samples, it is essential to adjust the cut-off probability. An appropriate choice of cut-off probability depends on the specification of the cost ratio between the Type I error and Type II error. The proposed optimal cut-off probability in this work is a function of the data sample selection methods and the cost ratio.
引用
收藏
页码:91 / 116
页数:26
相关论文
共 30 条