On properties of predictors derived with a two-step bootstrap model averaging approach -: A simulation study in the linear regression model

被引:19
作者
Buchholz, Anika
Hollaender, Norbert [2 ]
Sauerbrei, Willi [1 ]
机构
[1] Univ Med Ctr Freiburg, Inst Med Biometry & Med Informat, D-79104 Freiburg, Germany
[2] Nova Pharma AG, CH-4057 Basel, Switzerland
关键词
bootstrap; model averaging; model selection uncertainty; linear regression; variable screening;
D O I
10.1016/j.csda.2007.10.007
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
In many applications of model selection there is a large number of explanatory variables and thus a large set of candidate models. Selecting one single model for further inference ignores model selection uncertainty. Often several models fit the data equally well. However, these models may differ in terms of the variables included and might lead to different predictions. To account for model selection uncertainty, model averaging procedures have been proposed. Recently, an extended two-step bootstrap model averaging approach has been proposed. The first step of this approach is a screening step. It aims to eliminate variables with negligible effect on the outcome. In the second step the remaining variables are considered in bootstrap model averaging. A large simulation study is performed to compare the MSE and coverage rate of models derived with bootstrap model averaging, the full model, backward elimination using Akaike and Bayes information criterion and the model with the highest selection probability in bootstrap samples. In a data example, these approaches are also compared with Bayesian model averaging. Finally, some recommendations for the development of predictive models are given. (C) 2007 Published by Elsevier B.V.
引用
收藏
页码:2778 / 2793
页数:16
相关论文
共 29 条
[1]   The practical utility of incorporating model selection uncertainty into prognostic models for survival data [J].
Augustin, N ;
Sauerbrei, W ;
Schumacher, M .
STATISTICAL MODELLING, 2005, 5 (02) :95-118
[3]  
Breiman L, 1996, ANN STAT, V24, P2350
[4]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[5]   Model selection: An integral part of inference [J].
Buckland, ST ;
Burnham, KP ;
Augustin, NH .
BIOMETRICS, 1997, 53 (02) :603-618
[6]  
Burnham K. P., 2002, A practical informationtheoretic approach, DOI [DOI 10.1007/B97636, 10.1007/b97636]
[7]   Multimodel inference - understanding AIC and BIC in model selection [J].
Burnham, KP ;
Anderson, DR .
SOCIOLOGICAL METHODS & RESEARCH, 2004, 33 (02) :261-304
[8]   A note on model uncertainty in linear regression [J].
Candolo, C ;
Davison, AC ;
Demétrio, CGB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES D-THE STATISTICIAN, 2003, 52 :165-177
[9]   MODEL UNCERTAINTY, DATA MINING AND STATISTICAL-INFERENCE [J].
CHATFIELD, C .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES A-STATISTICS IN SOCIETY, 1995, 158 :419-466
[10]   Stock and bond return predictability: the discrimination power of model selection criteria [J].
Dell'Aquila, R ;
Ronchetti, E .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2006, 50 (06) :1478-1495