A comparison of model selection methods for prediction in the presence of multiply imputed data

被引:31
作者
Le Thi Phuong Thao [1 ]
Geskus, Ronald [1 ,2 ]
机构
[1] Univ Oxford, Biostat Grp, Clin Res Unit, Ho Chi Minh City, Vietnam
[2] Univ Oxford, Nuffield Dept Med, Oxford, England
基金
英国惠康基金;
关键词
lasso; multiply imputed data; prediction; stacked data; variable selection; VARIABLE SELECTION;
D O I
10.1002/bimj.201700232
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Many approaches for variable selection with multiply imputed data in the development of a prognostic model have been proposed. However, no method prevails as uniformly best. We conducted a simulation study with a binary outcome and a logistic regression model to compare two classes of variable selection methods in the presence of MI data: (I) Model selection on bootstrap data, using backward elimination based on AIC or lasso, and fit the final model based on the most frequently (e.g. >= 50%) selected variables over all MI and bootstrap data sets; (II) Model selection on original MI data, using lasso. The final model is obtained by (i) averaging estimates of variables that were selected in any MI data set or (ii) in 50% of the MI data; (iii) performing lasso on the stacked MI data, and (iv) as in (iii) but using individual weights as determined by the fraction of missingness. In all lasso models, we used both the optimal penalty and the 1-se rule. We considered recalibrating models to correct for overshrinkage due to the suboptimal penalty by refitting the linear predictor or all individual variables. We applied the methods on a real dataset of 951 adult patients with tuberculous meningitis to predict mortality within nine months. Overall, applying lasso selection with the 1-se penalty shows the best performance, both in approach I and II. Stacking MI data is an attractive approach because it does not require choosing a selection threshold when combining results from separate MI data sets
引用
收藏
页码:343 / 356
页数:14
相关论文
共 50 条
  • [41] FROM MODEL SELECTION TO MODEL AVERAGING: A COMPARISON FOR NESTED LINEAR MODELS
    Xu, Wenchao
    Zhang, Xinyu
    ECONOMETRIC THEORY, 2025,
  • [42] Comparison Between Linear and Non-linear Variable Selection Methods with Applications to Spectroscopic (UV-Vis/NIR) Data
    Krongchai, Chanida
    Wongsaipun, Sakunna
    Funsueb, Sujitra
    Theanjumpol, Parichat
    Jakmunee, Jaroon
    Kittiwachana, Sila
    CHIANG MAI JOURNAL OF SCIENCE, 2020, 47 (01): : 160 - 174
  • [43] Improving Lasso for model selection and prediction
    Pokarowski, Piotr
    Rejchel, Wojciech
    Soltys, Agnieszka
    Frej, Michal
    Mielniczuk, Jan
    SCANDINAVIAN JOURNAL OF STATISTICS, 2022, 49 (02) : 831 - 863
  • [44] Adaptive Bayesian SLOPE: Model Selection With Incomplete Data
    Jiang, Wei
    Bogdan, Malgorzata
    Josse, Julie
    Majewski, Szymon
    Miasojedow, Blazej
    Rockova, Veronika
    JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2022, 31 (01) : 113 - 137
  • [45] MODEL SELECTION FOR CORRELATED DATA WITH DIVERGING NUMBER OF PARAMETERS
    Cho, Hyunkeun
    Qu, Annie
    STATISTICA SINICA, 2013, 23 (02) : 901 - 927
  • [46] Comparison of Gene Selection Methods for Clustering Single-cell RNA-seq Data
    Zhu, Xiaoshu
    Wang, Jianxin
    Li, Rongruan
    Peng, Xiaoqing
    CURRENT BIOINFORMATICS, 2023, 18 (01) : 1 - 11
  • [47] A clonal selection algorithm model for daily rainfall data prediction
    Rodi, N. S. Noor
    Malek, M. A.
    Ismail, Amelia Ritahani
    Ting, Sie Chun
    Tang, Chao-Wei
    WATER SCIENCE AND TECHNOLOGY, 2014, 70 (10) : 1641 - 1647
  • [48] QSAR analysis of diaryl COX-2 inhibitors: Comparison of feature selection and train-test data selection methods
    Soltani, Somaieh
    Abolhasani, Hoda
    Zarghi, Afshin
    Jouyban, Abolghasem
    EUROPEAN JOURNAL OF MEDICINAL CHEMISTRY, 2010, 45 (07) : 2753 - 2760
  • [49] Variable selection in the presence of missing data: resampling and imputation
    Long, Qi
    Johnson, Brent A.
    BIOSTATISTICS, 2015, 16 (03) : 596 - 610
  • [50] Backward-in-Time Selection of the Order of Dynamic Regression Prediction Model
    Vlachos, Ioannis
    Kugiumtzis, Dimitris
    JOURNAL OF FORECASTING, 2013, 32 (08) : 685 - 701