Estimation of optimum thresholds for binary classification using genetic algorithm: An application to solve a credit scoring problem

被引:5
作者
Kazemi, Hamid Reza [1 ]
Khalili-Damghani, Kaveh [2 ]
Sadi-Nezhad, Soheil [3 ]
机构
[1] Ind Management Inst, Dept Syst & Ind Engn, Tehran, Iran
[2] Islamic Azad Univ, Dept Ind Engn, South Tehran Branch, Tehran, Iran
[3] Univ Waterloo, Dept Stat & Actuarial Sci, Waterloo, ON, Canada
关键词
classification problem; credit scoring; genetic algorithm; optimal cut-off point; optimal threshold value; performance criteria; BANKRUPTCY PREDICTION; NEURAL-NETWORKS; FINANCIAL RATIOS; HYBRID APPROACH; RISK; MODELS; PERFORMANCE; CLASSIFIERS; ENSEMBLE; LOGIT;
D O I
10.1111/exsy.13203
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The main issue in a classification problem is classifying observations into various disjoint classes. Different classification techniques generate a continuous number between a and b, usually between 0 and 1; thus, the optimal cut-off value(s) must be carefully selected to discriminate classes precisely. The decision is about setting a threshold value and transforming the continuous score into a binary output. Therefore, in addition to using the so-called sophisticated classification methods to have a more accurate classification, there is a need to identify and choose the optimal threshold value(s). However, the latter has not been thoroughly investigated. Hence, this study proposes an approach based on a Genetic Algorithm (GA) and Neural Networks (NNs) to automatically find customized cut-off values, considering different performance criteria and given datasets. Since credit scoring is a binary classification problem, two popular credit scoring datasets, namely "Australian" and "German" credit datasets, are used to test the proposed approach. Our numerical results revealed that the proposed GA-NN model could successfully find customized acceptance thresholds, considering predetermined performance criteria, including Accuracy, Estimated Misclassification Cost (EMC), and Area under ROC Curve (AUC) for the tested datasets. Furthermore, the best-obtained results and the paired-samples t-test results show that utilizing the customized cut-off points leads to a more accurate classification than the commonly-used threshold value of 0.5.
引用
收藏
页数:27
相关论文
共 71 条