Imputing missing covariates in time-to-event analysis within distributed research networks: A simulation study

被引:2
作者
Li, Dongdong [1 ,2 ]
Wong, Jenna [1 ,2 ]
Li, Xiaojuan [1 ,2 ]
Toh, Sengwee [1 ,2 ]
Wang, Rui [1 ,2 ,3 ]
机构
[1] Harvard Pilgrim Hlth Care Inst, Dept Populat Med, Boston, MA 02215 USA
[2] Harvard Med Sch, Boston, MA 02215 USA
[3] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA USA
基金
美国医疗保健研究与质量局;
关键词
Cox model; distributed research networks; missing covariates; multiple imputation; simulation study; PROPORTIONAL HAZARDS REGRESSION; MULTIPLE IMPUTATION; EQUATIONS; MICE;
D O I
10.1002/pds.5563
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose: In distributed research network (DRN) settings, multiple imputation cannot be directly implemented because pooling individual-level data are often not feasible. The performance of multiple imputation in combination with meta-analysis is not well understood within DRNs.Methods: To evaluate the performance of imputation for missing baseline covariate data in combination with meta-analysis for time-to-event analysis within DRNs, we compared two parametric algorithms including one approximated linear imputation model (Approx), and one nonlinear substantive model compatible imputation model (SMC), as well as two non-parametric machine learning algorithms including random forest (RF), and classification and regression trees (CART), through simulation studies motivated by a real-world data set.Results: Under the setting with small effect sizes (i.e., log-Hazard ratios [logHR]) and homogeneous missingness mechanisms across sites, all imputation methods produced unbiased and more efficient estimates while the complete-case analysis could be biased and inefficient; and under heterogeneous missingness mechanisms, estimates with RF method could have higher efficiency. Estimates from the distributed imputation combined by meta-analysis were similar to those from the imputation using pooled data. When logHRs were large, the SMC imputation algorithm generally performed better than others.Conclusions: These findings suggest the validity and feasibility of imputation within DRNs in the presence of missing covariate data in time-to-event analysis under various settings. The performance of the four imputation algorithms varies with the effect sizes and level of missingness.
引用
收藏
页码:330 / 340
页数:11
相关论文
共 38 条
[1]  
[Anonymous], 2003, MetaAnalysis Of Controlled Clinical Trials
[2]   Comparative Effectiveness and Safety of Bariatric Procedures for Weight Loss A PCORnet Cohort Study [J].
Arterburn, David ;
Wellman, Robert ;
Emiliano, Ana ;
Smith, Steven R. ;
Odegaard, Andrew O. ;
Murali, Sameer ;
Williams, Neely ;
Coleman, Karen J. ;
Courcoulas, Anita ;
Coley, R. Yates ;
Anau, Jane ;
Pardee, Roy ;
Toh, Sengwee ;
Janning, Cheri ;
Cook, Andrea ;
Sturtevant, Jessica ;
Horgan, Casie ;
McTigue, Kathleen M. .
ANNALS OF INTERNAL MEDICINE, 2018, 169 (11) :741-+
[3]  
Bartlett J., 2021, MULTIPLE IMPUTATION
[4]   Multiple imputation of covariates by fully conditional specification: Accommodating the substantive model [J].
Bartlett, Jonathan W. ;
Seaman, Shaun R. ;
White, Ian R. ;
Carpenter, James R. .
STATISTICAL METHODS IN MEDICAL RESEARCH, 2015, 24 (04) :462-487
[5]  
Breiman L, 1984, Classification and Regression Trees, DOI DOI 10.1201/9781315139470
[6]   Aggregating Electronic Health Record Data for COVID-19 Research-Caveat Emptor [J].
Brown, Jeffrey S. ;
Bastarache, Lisa ;
Weiner, Mark G. .
JAMA NETWORK OPEN, 2021, 4 (07)
[7]   Distributed Health Data Networks A Practical and Preferred Approach to Multi-Institutional Evaluations of Comparative Effectiveness, Safety, and Quality of Care [J].
Brown, Jeffrey S. ;
Holmes, John H. ;
Shah, Kiran ;
Hall, Ken ;
Lazarus, Ross ;
Platt, Richard .
MEDICAL CARE, 2010, 48 (06) :S45-S51
[8]   Multiple Imputation for Missing Data via Sequential Regression Trees [J].
Burgette, Lane F. ;
Reiter, Jerome P. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) :1070-1076
[9]   Multiple imputation for analysis of incomplete data in distributed health data networks [J].
Chang, Changgee ;
Deng, Yi ;
Jiang, Xiaoqian ;
Long, Qi .
NATURE COMMUNICATIONS, 2020, 11 (01)
[10]   Proportional hazards regression with missing covariates [J].
Chen, HY ;
Little, RJA .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1999, 94 (447) :896-908