Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap

被引:526
作者
Kim, Ji-Hyun [1 ]
机构
[1] Soongsil Univ, Dept Stat & Actuarial Sci, Seoul 156743, South Korea
关键词
Compendex;
D O I
10.1016/j.csda.2009.04.009
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider the accuracy estimation of a classifier constructed on a given training sample. The naive resubstitution estimate is known to have a downward bias problem. The traditional approach to tackling this bias problem is cross-validation. The bootstrap is another way to bring down the high variability of cross-validation. But a direct comparison of the two estimators, cross-validation and bootstrap, is not fair because the latter estimator requires much heavier computation. We performed an empirical study to compare the .632+ bootstrap estimator with the repeated 10-fold cross-validation and the repeated one-third holdout estimator. All the estimators were set to require about the same amount of computation. In the simulation study, the repeated 10-fold cross-validation estimator was found to have better performance than the .632+ bootstrap estimator when the classifier is highly adaptive to the training sample. We have also found that the .632+ bootstrap estimator suffers from a bias problem for large samples as well as for small samples. (C) 2009 Elsevier B.V. All rights reserved.
引用
收藏
页码:3735 / 3745
页数:11
相关论文
共 17 条
[1]  
[Anonymous], 2004, R LANG ENV STAT COMP
[2]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[3]  
Breiman, 1984, OLSHEN STONE CLASSIF, DOI [10.2307/2530946, DOI 10.2307/2530946]
[4]  
BURMAN P, 1989, BIOMETRIKA, V76, P503, DOI 10.2307/2336116
[5]   EXTENSIONS TO THE CART ALGORITHM [J].
CRAWFORD, SL .
INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1989, 31 (02) :197-217
[6]   Improvements on cross-validation: The .632+ bootstrap method [J].
Efron, B ;
Tibshirani, R .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560
[8]   A MONTE-CARLO STUDY OF THE 632-BOOTSTRAP ESTIMATOR OF ERROR RATE [J].
FITZMAURICE, GM ;
KRZANOWSKI, WJ ;
HAND, DJ .
JOURNAL OF CLASSIFICATION, 1991, 8 (02) :239-250
[9]   A decision-theoretic generalization of on-line learning and an application to boosting [J].
Freund, Y ;
Schapire, RE .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139
[10]   Multiple p-adic L-function [J].
Kim, T. .
RUSSIAN JOURNAL OF MATHEMATICAL PHYSICS, 2006, 13 (02) :151-157