CONDITIONAL PREDICTIVE INFERENCE POST MODEL SELECTION

被引:18
作者
Leeb, Hannes [1 ]
机构
[1] Yale Univ, Dept Stat, New Haven, CT 06511 USA
关键词
Predictive inference post model selection; regression with random design; conditional coverage probability; finite sample analysis; approximately honest and short prediction interval; BAYESIAN CONFIDENCE-INTERVALS; BREAST-CANCER; REGRESSION; ESTIMATORS; SETS; VARIABLES; APPROXIMATIONS; BALLS;
D O I
10.1214/08-AOS660
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We give a finite-sample analysis of predictive inference procedures after model selection in regression with random design. The analysis is focused on a statistically challenging scenario where the number of potentially important explanatory variables can be infinite, where no regularity conditions are imposed on unknown parameters, where the number of explanatory variables in a "good" model can be of the same order as sample size and where the number of candidate models can be of larger order than sample size. The performance of inference procedures is evaluated conditional on the training sample. Under weak conditions on only the number of candidate models and on their complexity, and uniformly over all data-generating processes under consideration, we show that a certain prediction interval is approximately valid and short with high probability in finite samples, in the sense that its actual coverage probability is close to the nominal one and in the sense that its length is close to the length of an infeasible interval that is constructed by actually knowing the "best" candidate model. Similar results are shown to hold for predictive inference procedures other than prediction intervals like, for example, tests of whether a future response will lie above of below a given threshold.
引用
收藏
页码:2838 / 2876
页数:39
相关论文
共 37 条
[1]  
Adam BL, 2002, CANCER RES, V62, P3609
[2]   Confidence balls in Gaussian regression [J].
Baraud, Y .
ANNALS OF STATISTICS, 2004, 32 (02) :528-551
[3]  
Barndorff-Nielsen O.E., 1996, BERNOULLI, V2, P319
[4]  
Beran R, 1998, ANN STAT, V26, P1826
[5]   HOW MANY VARIABLES SHOULD BE ENTERED IN A REGRESSION EQUATION [J].
BREIMAN, L ;
FREEDMAN, D .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1983, 78 (381) :131-136
[6]   Adaptive confidence balls [J].
Cai, TT ;
Low, MG .
ANNALS OF STATISTICS, 2006, 34 (01) :202-228
[7]   An adaptation theory for nonparametric confidence intervals [J].
Cai, TT ;
Low, MG .
ANNALS OF STATISTICS, 2004, 32 (05) :1805-1840
[8]   Prediction intervals, factor analysis models, and high-dimensional empirical linear prediction [J].
Ding, AA ;
Hwang, JTG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (446) :446-455
[9]  
Geisser S, 1993, MONOGRAPHS STAT APPL, V55
[10]   Adaptive confidence bands [J].
Genovese, Christopher ;
Wasserman, Larry .
ANNALS OF STATISTICS, 2008, 36 (02) :875-905