A simple pooling method for variable selection in multiply imputed datasets outperformed complex methods

被引:12
|
作者
Panken, A. M. [1 ,2 ]
Heymans, M. W. [1 ]
机构
[1] Vrije Univ Amsterdam, Amsterdam Publ Hlth Res Inst, Amsterdam UMC, Dept Epidemiol & Data Sci, Amsterdam, Netherlands
[2] Phys Therapy Practice Panken, Roermond, Netherlands
关键词
Logistic regression; Median-p-rule; Multiple imputation; Pooling selection methods; Variable selection; IMPUTATION; VALUES;
D O I
10.1186/s12874-022-01693-8
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background For the development of prognostic models, after multiple imputation, variable selection is advised to be applied from the pooled model. The aim of this study is to evaluate by using a simulation study and practical data example the performance of four different pooling methods for variable selection in multiple imputed datasets. These methods are the D1, D2, D3 and recently extended Median-P-Rule (MPR) for categorical, dichotomous, and continuous variables in logistic regression models. Methods Four datasets (n = 200 and n = 500), with 9 variables and correlations of respectively 0.2 and 0.6 between these variables, were simulated. These datasets included 2 categorical and 2 continuous variables with 20% missing at random data. Multiple Imputation (m = 5) was applied, and the four methods were compared with selection from the full model (without missing data). The same analyzes were repeated in five multiply imputed real-world datasets (NHANES) (m = 5, p = 0.05, N = 250/300/400/500/1000). Results In the simulated datasets, the differences between the pooling methods were most evident in the smaller datasets. The MPR performed equal to all other pooling methods for the selection frequency, as well as for the P-values of the continuous and dichotomous variables, however the MPR performed consistently better for pooling and selecting categorical variables in multiply imputed datasets and also regarding the stability of the selected prognostic models. Analyzes in the NHANES-dataset showed that all methods mostly selected the same models. Compared to each other however, the D2-method seemed to be the least sensitive and the MPR the most sensitive, most simple, and easy method to apply. Conclusions Considering that MPR is the most simple and easy pooling method to use for epidemiologists and applied researchers, we carefully recommend using the MPR-method to pool categorical variables with more than two levels after Multiple Imputation in combination with Backward Selection-procedures (BWS). Because MPR never performed worse than the other methods in continuous and dichotomous variables we also advice to use MPR in these types of variables.
引用
收藏
页数:11
相关论文
共 34 条
  • [21] Monitoring complex media fermentations with near-infrared spectroscopy: Comparison of different variable selection methods
    Ferreira, AP
    Alves, TP
    Menezes, JC
    BIOTECHNOLOGY AND BIOENGINEERING, 2005, 91 (04) : 474 - 481
  • [22] Hierarchical pooling sequence matching based optimal selection method of query graph for complex question answering over knowledge graph
    Wang, Dong
    Zhou, Sihang
    Huang, Jian
    Zhang, Zhongjie
    Xi Tong Gong Cheng Yu Dian Zi Ji Shu/Systems Engineering and Electronics, 2024, 46 (08): : 2686 - 2695
  • [24] TOPOLOGICAL METHODS IN THE THEORY OF FUNCTIONS OF A SINGLE COMPLEX VARIABLE .1. DEFORMATION TYPES OF LOCALLY SIMPLE PLANE CURVES
    MORSE, M
    HEINS, M
    ANNALS OF MATHEMATICS, 1945, 46 (04) : 600 - 624
  • [25] Effectiveness of Shrinkage and Variable Selection Methods for the Prediction of Complex Human Traits using Data from Distantly Related Individuals
    Berger, Swetlana
    Perez-Rodriguez, Paulino
    Veturi, Yogasudha
    Simianer, Henner
    de los Campos, Gustavo
    ANNALS OF HUMAN GENETICS, 2015, 79 (02) : 122 - 135
  • [26] Variable-Selection Emerges on Top in Empirical Comparison of Whole-Genome Complex-Trait Prediction Methods
    Haws, David C.
    Rish, Irina
    Teyssedre, Simon
    He, Dan
    Lozano, Aurelie C.
    Kambadur, Prabhanjan
    Karaman, Zivan
    Parida, Laxmi
    PLOS ONE, 2015, 10 (10):
  • [27] r2VIM: A variable selection method for identifying complex genetic models associated with human traits
    Holzinger, Emily R.
    Malley, James
    Li, Qing
    Bailey-Wilson, Joan E.
    GENETIC EPIDEMIOLOGY, 2015, 39 (07) : 556 - 556
  • [28] A simple method for forward variable selection and calibration: evaluation for compact and low-cost laser-induced breakdown spectroscopy system
    Gonzaga, Fabiano Barbieri
    Braga, Lescy Romulo, Jr.
    Sampaio, Alexandre Pimentel
    Martins, Thiago de Souza
    de Oliveira, Charles Giovani
    dos Santos Pacheco, Raquel Moraes
    ANALYTICAL AND BIOANALYTICAL CHEMISTRY, 2017, 409 (11) : 3017 - 3024
  • [29] A simple method for forward variable selection and calibration: evaluation for compact and low-cost laser-induced breakdown spectroscopy system
    Fabiano Barbieri Gonzaga
    Lescy Romulo Braga
    Alexandre Pimentel Sampaio
    Thiago de Souza Martins
    Charles Giovani de Oliveira
    Raquel Moraes dos Santos Pacheco
    Analytical and Bioanalytical Chemistry, 2017, 409 : 3017 - 3024
  • [30] Analysis of the vibration characteristics of a variable cross section rotor using the complex transfer matrix method and comparison with different methods
    Nib, Hueseyin Tarik
    Yildiz, Ahmet
    JOURNAL OF THE FACULTY OF ENGINEERING AND ARCHITECTURE OF GAZI UNIVERSITY, 2024, 39 (03): : 1649 - 1660