THE RESTRICTED CONSISTENCY PROPERTY OF LEAVE-nv-OUT CROSS-VALIDATION FOR HIGH-DIMENSIONAL VARIABLE SELECTION

被引:10
作者
Feng, Yang [1 ]
Yu, Yi [2 ]
机构
[1] Columbia Univ, Dept Stat, New York, NY 10027 USA
[2] Univ Bristol, Sch Math, Bristol BS8 1TH, Avon, England
关键词
Generalized linear models; leave-n(v)-out cross-validation; restricted maximum likelihood estimators; restricted model-selection consistency; variable selection; TUNING PARAMETER SELECTION; PENALIZED LIKELIHOOD; GENE-EXPRESSION; MODEL SELECTION; ERROR RATE; PREDICTION; REGRESSION; PATH;
D O I
10.5705/ss.202015.0394
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Cross-validation (CV) methods are popular for selecting the tuning parameter in high-dimensional variable selection problems. We show that a misalignment of the CV is one possible reason for its over-selection behavior. To fix this issue, we propose using a version of leave-n(v)-out CV (CV(n(v))) to select the optimal model from a restricted candidate model set for high-dimensional generalized linear models. By using the same candidate model sequence and a proper order for the construction sample size n(c) in each CV split, CV(n(v)) avoids potential problems when developing theoretical properties. CV(n(v)) is shown to exhibit the restricted model-selection consistency property under mild conditions. Extensive simulations and a real-data analysis support the theoretical results and demonstrate the performance of CV(n(v)) in terms of both model selection and prediction.
引用
收藏
页码:1607 / 1630
页数:24
相关论文
共 34 条
[1]   COORDINATE DESCENT ALGORITHMS FOR NONCONVEX PENALIZED REGRESSION, WITH APPLICATIONS TO BIOLOGICAL FEATURE SELECTION [J].
Breheny, Patrick ;
Huang, Jian .
ANNALS OF APPLIED STATISTICS, 2011, 5 (01) :232-253
[2]  
Bühlmann P, 2011, SPRINGER SER STAT, P1, DOI 10.1007/978-3-642-20192-9
[3]   Extended Bayesian information criteria for model selection with large model spaces [J].
Chen, Jiahua ;
Chen, Zehua .
BIOMETRIKA, 2008, 95 (03) :759-771
[4]  
CHEN SB, 1994, CONF REC ASILOMAR C, P41, DOI 10.1109/ACSSC.1994.471413
[5]   Homozygosity mapping with SNP arrays identifies TRIM32 an E3 ubiquitin ligase, as a Bardet-Biedl syndrome gene (BBS11) [J].
Chiang, AP ;
Beck, JS ;
Yen, HJ ;
Tayeh, MK ;
Scheetz, TE ;
Swiderski, RE ;
Nishimura, DY ;
Braun, TA ;
Kim, KYA ;
Huang, J ;
Elbedour, K ;
Carmi, R ;
Slusarski, DC ;
Casavant, TL ;
Stone, EM ;
Sheffield, VC .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2006, 103 (16) :6287-6292
[6]   IDEAL SPATIAL ADAPTATION BY WAVELET SHRINKAGE [J].
DONOHO, DL ;
JOHNSTONE, IM .
BIOMETRIKA, 1994, 81 (03) :425-455
[8]   Least angle regression - Rejoinder [J].
Efron, B ;
Hastie, T ;
Johnstone, I ;
Tibshirani, R .
ANNALS OF STATISTICS, 2004, 32 (02) :494-499
[10]   Nonparametric Independence Screening in Sparse Ultra-High-Dimensional Additive Models [J].
Fan, Jianqing ;
Feng, Yang ;
Song, Rui .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2011, 106 (494) :544-557