cutpointr: Improved Estimation and Validation of Optimal Cutpoints in R

被引:249
作者
Thiele, Christian [1 ]
Hirschfeld, Gerrit [1 ]
机构
[1] Univ Appl Sci Bielefeld, Interakt 1, D-33619 Bielefeld, Germany
关键词
optimal cutpoint; ROC curve; bootstrap; R; CUT POINTS; CROSS-VALIDATION; PERFORMANCE; SELECTION; PACKAGE; BIAS;
D O I
10.18637/jss.v098.i11
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
"Optimal cutpoints" for binary classification tasks are often established by testing which cutpoint yields the best discrimination, for example the Youden index, in a specific sample. This results in "optimal" cutpoints that are highly variable and systematically overestimate the out-of-sample performance. To address these concerns, the cutpointr package offers robust methods for estimating optimal cutpoints and the out-of-sample performance. The robust methods include bootstrapping and smoothing based on kernel estimation, generalized additive models, smoothing splines, and local regression. These methods can be applied to a wide range of binary-classification and cost-based metrics. cutpointr also provides mechanisms to utilize user-defined metrics and estimation methods. The package has capabilities for parallelization of the bootstrapping, including reproducible random number generation. Furthermore, it is pipe-friendly, for example for compatibility with functions from tidyverse. Various functions for plotting receiver operating characteristic curves, precision recall graphs, bootstrap results and other representations of the data are included. The package contains example data from a study on psychological characteristics and suicide attempts suitable for applying binary classification algorithms.
引用
收藏
页数:27
相关论文
共 38 条
[1]   DANGERS OF USING OPTIMAL CUTPOINTS IN THE EVALUATION OF PROGNOSTIC FACTORS [J].
ALTMAN, DG ;
LAUSEN, B ;
SAUERBREI, W ;
SCHUMACHER, M .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1994, 86 (11) :829-835
[2]  
Altman DG, 2000, STAT MED, V19, P453, DOI 10.1002/(SICI)1097-0258(20000229)19:4<453::AID-SIM350>3.3.CO
[3]  
2-X
[4]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[5]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
BUJA A, 1989, ANN STAT, V17, P453, DOI 10.1214/aos/1176347115
[8]  
Carpenter J, 2000, STAT MED, V19, P1141, DOI 10.1002/(SICI)1097-0258(20000515)19:9<1141::AID-SIM479>3.0.CO
[9]  
2-F
[10]   Performance of Error Estimators for Classification [J].
Dougherty, Edward R. ;
Sima, Chao ;
Hua, Jianping ;
Hanczar, Blaise ;
Braga-Neto, Ulisses M. .
CURRENT BIOINFORMATICS, 2010, 5 (01) :53-67