Which Is Better: Holdout or Full-Sample Classifier Design?

被引:3
作者
Brun, Marcel [1 ]
Xu, Qian [2 ]
Dougherty, Edward R. [1 ,2 ]
机构
[1] Translat Genom Res Inst, Computat Biol Div, Phoenix, AZ 85004 USA
[2] Texas A&M Univ, Dept Elect & Comp Engn, College Stn, TX 77843 USA
关键词
D O I
10.1155/2008/297945
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Is it better to design a classifier and estimate its error on the full sample or to design a classifier on a training subset and estimate its error on the holdout test subset? Full-sample design provides the better classifier; nevertheless, one might choose holdout with the hope of better error estimation. A conservative criterion to decide the best course is to aimat a classifier whose error is less than a given bound. Then the choice between full-sample and holdout designs depends on which possesses the smaller expected bound. Using this criterion, we examine the choice between holdout and several full-sample error estimators using covariance models and a patient-data model. Full-sample design consistently outperforms holdout design. The relation between the two designs is revealed via a decomposition of the expected bound into the sum of the expected true error and the expected conditional standard deviation of the true error. Copyright (C) 2008 Marcel Brun et al.
引用
收藏
页数:8
相关论文
共 14 条
[1]   Bolstered error estimation [J].
Braga-Neto, U ;
Dougherty, E .
PATTERN RECOGNITION, 2004, 37 (06) :1267-1281
[2]   Is cross-validation valid for small-sample microarray classification? [J].
Braga-Neto, UM ;
Dougherty, ER .
BIOINFORMATICS, 2004, 20 (03) :374-380
[3]  
Chernick M.R., 1999, BOOTSTRAP METHODS PR
[4]   Genetic test bed for feature selection [J].
Choudhary, A ;
Brun, M ;
Hua, JP ;
Lowey, J ;
Suh, E ;
Dougherty, ER .
BIOINFORMATICS, 2006, 22 (07) :837-842
[5]  
Devroye L., 1996, PROBABILISTIC THEORY
[6]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26
[7]   Prediction error estimation: a comparison of resampling methods [J].
Molinaro, AM ;
Simon, R ;
Pfeiffer, RM .
BIOINFORMATICS, 2005, 21 (15) :3301-3307
[8]   Impact of error estimation on feature selection [J].
Sima, C ;
Attoor, S ;
Brag-Neto, U ;
Lowey, J ;
Suh, E ;
Dougherty, ER .
PATTERN RECOGNITION, 2005, 38 (12) :2472-2482
[9]   Optimal convex error estimators for classification [J].
Sima, Chao ;
Dougherty, Edward R. .
PATTERN RECOGNITION, 2006, 39 (09) :1763-1780
[10]  
Tabus I, 2005, EURASIP BOOK SER SIG, V2, P67