Multiple imputation of missing covariates with non-linear effects and interactions: an evaluation of statistical methods

被引:132
作者
Seaman, Shaun R. [1 ]
Bartlett, Jonathan W. [2 ]
White, Ian R. [1 ]
机构
[1] Inst Publ Hlth, MRC Biostat Unit, Cambridge CB2 0SR, England
[2] Univ London London Sch Hyg & Trop Med, Dept Med Stat, London WC1E 7HT, England
关键词
PLASMA;
D O I
10.1186/1471-2288-12-46
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Multiple imputation is often used for missing data. When a model contains as covariates more than one function of a variable, it is not obvious how best to impute missing values in these covariates. Consider a regression with outcome Y and covariates X and X-2. In 'passive imputation' a value X* is imputed for X and then X-2 is imputed as (X*)(2). A recent proposal is to treat X-2 as 'just another variable' (JAV) and impute X and X-2 under multivariate normality. Methods: We use simulation to investigate the performance of three methods that can easily be implemented in standard software: 1) linear regression of X on Y to impute X then passive imputation of X-2; 2) the same regression but with predictive mean matching (PMM); and 3) JAV. We also investigate the performance of analogous methods when the analysis involves an interaction, and study the theoretical properties of JAV. The application of the methods when complete or incomplete confounders are also present is illustrated using data from the EPIC Study. Results: JAV gives consistent estimation when the analysis is linear regression with a quadratic or interaction term and X is missing completely at random. When X is missing at random, JAV may be biased, but this bias is generally less than for passive imputation and PMM. Coverage for JAV was usually good when bias was small. However, in some scenarios with a more pronounced quadratic effect, bias was large and coverage poor. When the analysis was logistic regression, JAV's performance was sometimes very poor. PMM generally improved on passive imputation, in terms of bias and coverage, but did not eliminate the bias. Conclusions: Given the current state of available software, JAV is the best of a set of imperfect imputation methods for linear regression with a quadratic or interaction effect, but should not be used for logistic regression.
引用
收藏
页数:13
相关论文
共 22 条
[1]  
Ake CF, 2005, SUGI 30 P
[2]  
Bates CJ, 1991, DESIGN CONCEPTS NUTR
[3]   Nutritional methods in the European prospective investigation of cancer in Norfolk [J].
Bingham, SA ;
Welch, AA ;
McTaggart, A ;
Mulligan, AA ;
Runswick, SA ;
Luben, R ;
Oakes, S ;
Khaw, KT ;
Wareham, N ;
Day, NE .
PUBLIC HEALTH NUTRITION, 2001, 4 (03) :847-858
[4]   Vitamin C concentrations in plasma as a function of intake: A meta-analysis [J].
Brubacher, D ;
Moser, U ;
Jordan, P .
INTERNATIONAL JOURNAL FOR VITAMIN AND NUTRITION RESEARCH, 2000, 70 (05) :226-237
[5]  
Day N, 1999, BRIT J CANCER, V80, P95
[6]   Is plasma vitamin C an appropriate biomarker of vitamin C intake? A systematic review and meta-analysis [J].
Dehghan, Mahshid ;
Akhtar-Danesh, Noori ;
McMillan, Catherine R. ;
Thabane, Lehana .
NUTRITION JOURNAL, 2007, 6 (1)
[7]   Comparing nonnested Cox models [J].
Fine, JP .
BIOMETRIKA, 2002, 89 (03) :635-647
[8]  
Little R. J. A., 2002, STAT ANAL MISSING DA, V2nd
[9]  
Nielsen SF, 2003, INT STAT REV, V71, P593
[10]   LOGISTIC DISEASE INCIDENCE MODELS AND CASE-CONTROL STUDIES [J].
PRENTICE, RL ;
PYKE, R .
BIOMETRIKA, 1979, 66 (03) :403-411