Multiple predicting K-fold cross-validation for model selection

被引:306
作者
Jung, Yoonsuh [1 ]
机构
[1] Korea Univ, Dept Stat, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Cross-validation; K-fold cross-validation; model selection; tuning parameter selection; VARIABLE SELECTION; REGULARIZATION PATHS; LASSO; CONSISTENCY; SHRINKAGE;
D O I
10.1080/10485252.2017.1404598
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
K-fold cross-validation (CV) is widely adopted as a model selection criterion. In K-fold CV, (K - 1) folds are used for model construction and the hold-out fold is allocated to model validation. This implies model construction is more emphasised than the model validation procedure. However, some studies have revealed that more emphasis on the validation procedure may result in improved model selection. Specifically, leave-m-out CV with n samples may achieve variable-selection consistency when m/n approaches to 1. In this study, a new CV method is proposed within the framework of K-fold CV. The proposed method uses (K - 1) folds of the data for model validation, while the other fold is for model construction. This provides (K - 1) predicted values for each observation. These values are averaged to produce a final predicted value. Then, the model selection based on the averaged predicted values can reduce variation in the assessment due to the averaging. The variable-selection consistency of the suggested method is established. Its advantage over K-fold CV with finite samples are examined under linear, non-linear, and high-dimensional models.
引用
收藏
页码:197 / 215
页数:19
相关论文
共 37 条
[1]  
Akaike H, 1973, Selected Papers of Hirotugu Akaike, P199, DOI DOI 10.1007/978-1-4612-1694-0_15
[2]   A survey of cross-validation procedures for model selection [J].
Arlot, Sylvain ;
Celisse, Alain .
STATISTICS SURVEYS, 2010, 4 :40-79
[3]   A CROSS-VALIDATORY METHOD FOR DEPENDENT DATA [J].
BURMAN, P ;
CHOW, E ;
NOLAN, D .
BIOMETRIKA, 1994, 81 (02) :351-358
[5]   Generalised correlated cross-validation [J].
Carmack, Patrick S. ;
Spence, Jeffrey S. ;
Schucany, William R. .
JOURNAL OF NONPARAMETRIC STATISTICS, 2012, 24 (02) :269-282
[6]   Far Casting Cross-Validation [J].
Carmack, Patrick S. ;
Schucany, William R. ;
Spence, Jeffrey S. ;
Gunst, Richard F. ;
Lin, Qihua ;
Haley, Robert W. .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2009, 18 (04) :879-893
[7]   Unifying the derivations for the Akaike and corrected Akaike information criteria [J].
Cavanaugh, JE .
STATISTICS & PROBABILITY LETTERS, 1997, 33 (02) :201-208
[8]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[9]   EXTENDED BIC FOR SMALL-n-LARGE-P SPARSE GLM [J].
Chen, Jiahua ;
Chen, Zehua .
STATISTICA SINICA, 2012, 22 (02) :555-574
[10]   CONSISTENT CROSS-VALIDATED DENSITY-ESTIMATION [J].
CHOW, YS ;
GEMAN, S ;
WU, LD .
ANNALS OF STATISTICS, 1983, 11 (01) :25-38