Model selection bias and Freedman's paradox

被引:255
作者
Lukacs, Paul M. [1 ]
Burnham, Kenneth P. [2 ]
Anderson, David R. [2 ]
机构
[1] Colorado Div Wildlife, Ft Collins, CO 80526 USA
[2] Colorado State Univ, US Geol Survey, Colorado Cooperat Fish & Wildlife Res Unit, Ft Collins, CO 80523 USA
关键词
Akaike's information criterion; Confidence interval coverage; Freedman's paradox; Model averaging; Model selection bias; Model selection uncertainty; Multimodel inference; Stepwise selection; REGRESSION;
D O I
10.1007/s10463-009-0234-4
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In situations where limited knowledge of a system exists and the ratio of data points to variables is small, variable selection methods can often be misleading. Freedman (Am Stat 37:152-155, 1983) demonstrated how common it is to select completely unrelated variables as highly "significant" when the number of data points is similar in magnitude to the number of variables. A new type of model averaging estimator based on model selection with Akaike's AIC is used with linear regression to investigate the problems of likely inclusion of spurious effects and model selection bias, the bias introduced while using the data to select a single seemingly "best" model from a (often large) set of models employing many predictor variables. The new model averaging estimator helps reduce these problems and provides confidence interval coverage at the nominal level while traditional stepwise selection has poor inferential properties.
引用
收藏
页码:117 / 125
页数:9
相关论文
共 23 条
[1]  
AKAIKE H, 1979, BIOMETRIKA, V66, P237, DOI 10.1093/biomet/66.2.237
[2]   LIKELIHOOD OF A TIME-SERIES MODEL [J].
AKAIKE, H .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 1978, 27 (3-4) :217-235
[3]  
AKAIKE H, 1998, SELECTED PAPERS HIRO, P199, DOI DOI 10.1007/978-1-4612-1694-0_15
[4]  
Anderson D.R., 2008, MODEL BASED INFERENC
[5]  
[Anonymous], SAS VERS 8 02
[6]   Model selection: An integral part of inference [J].
Buckland, ST ;
Burnham, KP ;
Augustin, NH .
BIOMETRICS, 1997, 53 (02) :603-618
[7]   Multimodel inference - understanding AIC and BIC in model selection [J].
Burnham, KP ;
Anderson, DR .
SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) :261-304
[8]  
Burnham KP., 2002, MODEL SELECTION MULT, DOI DOI 10.1007/B97636
[9]  
Claeskens G., 2008, Model Selection and Model Averaging, DOI DOI 10.1017/CBO9780511790485
[10]   A NOTE ON SCREENING REGRESSION EQUATIONS [J].
FREEDMAN, DA .
AMERICAN STATISTICIAN, 1983, 37 (02) :152-155