STABILITY INVESTIGATIONS OF MULTIVARIABLE REGRESSION MODELS DERIVED FROM LOW- AND HIGH-DIMENSIONAL DATA

被引:105
作者
Sauerbrei, Willi [1 ]
Boulesteix, Anne-Laure [2 ]
Binder, Harald [1 ,3 ]
机构
[1] Univ Med Ctr Freiburg, Inst Med Biometry & Informat, D-79104 Freiburg, Germany
[2] Univ Munich, Dept Med Informat Biometry & Epidemiol, Munich, Germany
[3] Univ Freiburg, Freiburg Ctr Data Anal & Modeling, D-79106 Freiburg, Germany
关键词
Complexity; High-dimensional data; Resampling; Stability; Variable selection; CROSS-VALIDATION; BOOTSTRAP METHODS; MICROARRAY DATA; ERROR ESTIMATION; SELECTION; PREDICTION; SURVIVAL; UNCERTAINTY; BIAS;
D O I
10.1080/10543406.2011.629890
中图分类号
R9 [药学];
学科分类号
1007 ;
摘要
Multivariable regression models can link a potentially large number of variables to various kinds of outcomes, such as continuous, binary, or time-to-event endpoints. Selection of important variables and selection of the functional form for continuous covariates are key parts of building such models but are notoriously difficult due to several reasons. Caused by multicollinearity between predictors and a limited amount of information in the data, (in)stability can be a serious issue of models selected. For applications with a moderate number of variables, resampling-based techniques have been developed for diagnosing and improving multivariable regression models. Deriving models for high-dimensional molecular data has led to the need for adapting these techniques to settings where the number of variables is much larger than the number of observations. Three studies with a time-to-event outcome, of which one has high-dimensional data, are used to illustrate several techniques. Investigations at the covariate level and at the predictor level are seen to provide considerable insight into model stability and performance. While some areas are indicated where resampling techniques for model building still need further refinement, our case studies illustrate that these techniques can already be recommended for wider use.
引用
收藏
页码:1206 / 1231
页数:26
相关论文
共 54 条
[1]   BOOTSTRAP INVESTIGATION OF THE STABILITY OF A COX REGRESSION-MODEL [J].
ALTMAN, DG ;
ANDERSEN, PK .
STATISTICS IN MEDICINE, 1989, 8 (07) :771-783
[2]  
[Anonymous], 2008, Multivariable model-building: a pragmatic approach to regression analysis based on fractional polynomials for modelling continuous variables
[3]   The practical utility of incorporating model selection uncertainty into prognostic models for survival data [J].
Augustin, N ;
Sauerbrei, W ;
Schumacher, M .
STATISTICAL MODELLING, 2005, 5 (02) :95-118
[4]   Bootstrap methods for developing predictive models [J].
Austin, PC ;
Tu, JV .
AMERICAN STATISTICIAN, 2004, 58 (02) :131-137
[5]   Allowing for mandatory covariates in boosting estimation of sparse high-dimensional survival models [J].
Binder, Harald ;
Schumacher, Martin .
BMC BIOINFORMATICS, 2008, 9 (1)
[6]  
Binder H, 2008, STAT APPL GENET MOL, V7
[7]   Stability analysis of an additive spline model for respiratory health data by using knot removal [J].
Binder, Harald ;
Sauerbrei, Willi .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2009, 58 :577-600
[8]   Use of pretransformation to cope with extreme values in important candidate features [J].
Boulesteix, Anne-Laure ;
Guillemot, Vincent ;
Sauerbrei, Willi .
BIOMETRICAL JOURNAL, 2011, 53 (04) :673-688
[9]   Stability and aggregation of ranked gene lists [J].
Boulesteix, Anne-Laure ;
Slawski, Martin .
BRIEFINGS IN BIOINFORMATICS, 2009, 10 (05) :556-568
[10]   Predicting survival from microarray data -: a comparative study [J].
Bovelstad, H. M. ;
Nygard, S. ;
Storvold, H. L. ;
Aldrin, M. ;
Borgan, O. ;
Frigessi, A. ;
Lingjaerde, O. C. .
BIOINFORMATICS, 2007, 23 (16) :2080-2087