The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: A simulation study

被引:15
作者
Karahalios A. [1 ,2 ]
Baglietto L. [1 ,2 ]
Lee K.J. [3 ,4 ]
English D.R. [1 ,2 ]
Carlin J.B. [1 ,3 ]
Simpson J.A. [1 ,2 ]
机构
[1] Centre for Molecular, Environmental, Genetic, and Analytic Epidemiology, Melbourne School of Population and Global Health, University of Melbourne, Parkville
[2] Cancer Epidemiology Centre, Cancer Council Victoria, Melbourne
[3] Clinical Epidemiology and Biostatistics Unit, Murdoch Childrens Research Institute, Parkville, VIC
[4] Department of Paediatrics, University of Melbourne, Parkville, VIC
来源
Emerging Themes in Epidemiology | / 10卷 / 1期
基金
英国医学研究理事会;
关键词
Complete-case analysis; Missing exposure; Multiple imputation; Repeated exposure measurement; Simulation study;
D O I
10.1186/1742-7622-10-6
中图分类号
学科分类号
摘要
Background: Missing data often cause problems in longitudinal cohort studies with repeated follow-up waves. Research in this area has focussed on analyses with missing data in repeated measures of the outcome, from which participants with missing exposure data are typically excluded. We performed a simulation study to compare complete-case analysis with Multiple imputation (MI) for dealing with missing data in an analysis of the association of waist circumference, measured at two waves, and the risk of colorectal cancer (a completely observed outcome). Methods. We generated 1,000 datasets of 41,476 individuals with values of waist circumference at waves 1 and 2 and times to the events of colorectal cancer and death to resemble the distributions of the data from the Melbourne Collaborative Cohort Study. Three proportions of missing data (15, 30 and 50%) were imposed on waist circumference at wave 2 using three missing data mechanisms: Missing Completely at Random (MCAR), and a realistic and a more extreme covariate-dependent Missing at Random (MAR) scenarios. We assessed the impact of missing data on two epidemiological analyses: 1) the association between change in waist circumference between waves 1 and 2 and the risk of colorectal cancer, adjusted for waist circumference at wave 1; and 2) the association between waist circumference at wave 2 and the risk of colorectal cancer, not adjusted for waist circumference at wave 1. Results: We observed very little bias for complete-case analysis or MI under all missing data scenarios, and the resulting coverage of interval estimates was near the nominal 95% level. MI showed gains in precision when waist circumference was included as a strong auxiliary variable in the imputation model. Conclusions: This simulation study, based on data from a longitudinal cohort study, demonstrates that there is little gain in performing MI compared to a complete-case analysis in the presence of up to 50% missing data for the exposure of interest when the data are MCAR, or missing dependent on covariates. MI will result in some gain in precision if a strong auxiliary variable that is not in the analysis model is included in the imputation model. © 2013 Karahalios et al.; licensee BioMed Central Ltd.
引用
收藏
相关论文
共 49 条
[1]  
Karahalios A., Baglietto L., English D., Simpson J., A review of reporting missing data in cohort studies with repeated assessment of exposure measures, BMC Med Res Methodol, 12, (2012)
[2]  
Eekhout I., De Boer R.M., Twisk J.W.R., De Vet H.C.W., Heymans M.W., Missing data: A systematic review of how they are reported and handled, Epidemiology, 23, 5, pp. 729-732, (2012)
[3]  
Marshall A., Altman D.G., Royston P., Holder R.L., Comparison of techniques for handling missing covariate data within prognostic modelling studies: A simulation study, BMC Med Res Methodol, 10, (2010)
[4]  
White I.R., Carlin J.B., Bias and efficiency of multiple imputation compared with complete-case analysis for missing covariate values, Stat Med, 29, 28, pp. 2920-2931, (2010)
[5]  
Van Der Heijden G., Donders A.R.T., Stijnen T., Moons K.G.M., Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example, J Clin Epidemiol, 59, 10, pp. 1102-1109, (2006)
[6]  
Vach W., Blettner M., Biased estimation of the odds ratio in case-control studies due to the use of ad hoc methods of correcting for missing values for confounding variables, Am J Epidemiol, 134, 8, pp. 895-907, (1991)
[7]  
SAS OnlineDoc, Version 8, (2000)
[8]  
Stata Statistical Software: Release 11, (2009)
[9]  
Little R.J.A., Rubin D.B., Statistical Analysis with Missing Data (2nd Edition), (2002)
[10]  
Demissie S., Lavalley M.P., Horton N.J., Glynn R.J., Cupples L.A., Bias due to missing exposure data using complete-case analysis in the proportional hazards regression model, Stat Med, 22, 4, pp. 545-557, (2003)