Estimating classification error rate: Repeated cross-validation, repeated hold-out and bootstrap

被引：547

作者：

Kim, Ji-Hyun ^{[1
]}

机构：

[1] Soongsil Univ, Dept Stat & Actuarial Sci, Seoul 156743, South Korea

来源：

COMPUTATIONAL STATISTICS & DATA ANALYSIS | 2009年 / 53卷 / 11期

关键词：

Compendex;

D O I：

10.1016/j.csda.2009.04.009

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We consider the accuracy estimation of a classifier constructed on a given training sample. The naive resubstitution estimate is known to have a downward bias problem. The traditional approach to tackling this bias problem is cross-validation. The bootstrap is another way to bring down the high variability of cross-validation. But a direct comparison of the two estimators, cross-validation and bootstrap, is not fair because the latter estimator requires much heavier computation. We performed an empirical study to compare the .632+ bootstrap estimator with the repeated 10-fold cross-validation and the repeated one-third holdout estimator. All the estimators were set to require about the same amount of computation. In the simulation study, the repeated 10-fold cross-validation estimator was found to have better performance than the .632+ bootstrap estimator when the classifier is highly adaptive to the training sample. We have also found that the .632+ bootstrap estimator suffers from a bias problem for large samples as well as for small samples. (C) 2009 Elsevier B.V. All rights reserved.

引用

页码：3735 / 3745

页数：11

共 17 条

[1]

[Anonymous], 2004, R LANG ENV STAT COMP

[2] Is cross-validation valid for small-sample microarray classification? [J].

Braga-Neto, UM ;

Dougherty, ER .

BIOINFORMATICS, 2004, 20 (03) :374-380

[3]

Breiman, 1984, OLSHEN STONE CLASSIF, DOI [10.2307/2530946, DOI 10.2307/2530946]

[4]

BURMAN P, 1989, BIOMETRIKA, V76, P503, DOI 10.2307/2336116

[5] EXTENSIONS TO THE CART ALGORITHM [J].

CRAWFORD, SL .

INTERNATIONAL JOURNAL OF MAN-MACHINE STUDIES, 1989, 31 (02) :197-217

[6] Improvements on cross-validation: The .632+ bootstrap method [J].

Efron, B ;

Tibshirani, R .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1997, 92 (438) :548-560

[7] ESTIMATING THE ERROR RATE OF A PREDICTION RULE - IMPROVEMENT ON CROSS-VALIDATION [J].

EFRON, B .

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (382) :316-331

[8] A MONTE-CARLO STUDY OF THE 632-BOOTSTRAP ESTIMATOR OF ERROR RATE [J].

FITZMAURICE, GM ;

KRZANOWSKI, WJ ;

HAND, DJ .

JOURNAL OF CLASSIFICATION, 1991, 8 (02) :239-250

[9] A decision-theoretic generalization of on-line learning and an application to boosting [J].

Freund, Y ;

Schapire, RE .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1997, 55 (01) :119-139

[10] Multiple p-adic L-function [J].

Kim, T. .

RUSSIAN JOURNAL OF MATHEMATICAL PHYSICS, 2006, 13 (02) :151-157

← 1 2 →