Methods for Dealing With Missing Covariate Data in Epigenome-Wide Association Studies

被引:4
作者
Mills, Harriet L. [1 ]
Heron, Jon [1 ]
Relton, Caroline [1 ]
Suderman, Matt [1 ]
Tilling, Kate [1 ]
机构
[1] Univ Bristol, Bristol Med Sch, MRC Integrat Epidemiol Unit, Oakfield House, Bristol BS8 2BN, Avon, England
基金
英国医学研究理事会; 英国生物技术与生命科学研究理事会; 英国惠康基金;
关键词
Accessible Resource for Integrated Epigenomics Studies; Avon Longitudinal Study of Parents and Children; epigenetic data; imputation; missing data; MULTIPLE IMPUTATION; STRATEGIES; REGRESSION; SMOKING; VALUES; MICE;
D O I
10.1093/aje/kwz186
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Multiple imputation (MI) is a well-established method for dealing with missing data. MI is computationally intensive when imputing missing covariates with high-dimensional outcome data (e.g., DNA methylation data in epigenome-wide association studies (EWAS)), because every outcome variable must be included in the imputation model to avoid biasing associations towards the null. Instead, EWAS analyses are reduced to only complete cases, limiting statistical power and potentially causing bias. We used simulations to compare 5 MI methods for high-dimensional data under 2 missingness mechanisms. All imputation methods had increased power over complete-case (C-C) analyses. Imputing missing values separately for each variable was computationally inefficient, but dividing sites at random into evenly sized bins improved efficiency and gave low bias. Methods imputing solely using subsets of sites identified by the C-C analysis suffered from bias towards the null. However, if these subsets were added into random bins of sites, this bias was reduced. The optimal methods were applied to an EWAS with missingness in covariates. All methods identified additional sites over the C-C analysis, and many of these sites had been replicated in other studies. These methods are also applicable to other high-dimensional data sets, including the rapidly expanding area of "-omics" studies.
引用
收藏
页码:2021 / 2030
页数:10
相关论文
共 35 条
  • [1] [Anonymous], 2014, STAT ANAL MISSING DA
  • [2] [Anonymous], SOCIOL METHODS RES
  • [3] Multiple imputation models should incorporate the outcome in the model of interest
    Bartlett, Jonathan W.
    Frost, Chris
    Carpenter, James R.
    [J]. BRAIN, 2011, 134
  • [4] What Improves with Increased Missing Data Imputations?
    Bodner, Todd E.
    [J]. STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2008, 15 (04) : 651 - 675
  • [5] Cohort Profile: The 'Children of the 90s'-the index offspring of the Avon Longitudinal Study of Parents and Children
    Boyd, Andy
    Golding, Jean
    Macleod, John
    Lawlor, Debbie A.
    Fraser, Abigail
    Henderson, John
    Molloy, Lynn
    Ness, Andy
    Ring, Susan
    Smith, George Davey
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2013, 42 (01) : 111 - 127
  • [6] Carpenter J, 2012, MULTIPLE IMPUTATION
  • [7] A comparison of inclusive and restrictive strategies in modern missing data procedures
    Collins, LM
    Schafer, JL
    Kam, CM
    [J]. PSYCHOLOGICAL METHODS, 2001, 6 (04) : 330 - 351
  • [8] Multiple Imputation for General Missing Data Patterns in the Presence of High-dimensional Data
    Deng, Yi
    Chang, Changgee
    Ido, Moges Seyoum
    Long, Qi
    [J]. SCIENTIFIC REPORTS, 2016, 6
  • [9] Cohort Profile: The Avon Longitudinal Study of Parents and Children: ALSPAC mothers cohort
    Fraser, Abigail
    Macdonald-Wallis, Corrie
    Tilling, Kate
    Boyd, Andy
    Golding, Jean
    Smith, George Davey
    Henderson, John
    Macleod, John
    Molloy, Lynn
    Ness, Andy
    Ring, Susan
    Nelson, Scott M.
    Lawlor, Debbie A.
    [J]. INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2013, 42 (01) : 97 - 110
  • [10] Auxiliary variables in multiple imputation in regression with missing X: a warning against including too many in small sample research
    Hardt, Jochen
    Herke, Max
    Leonhart, Rainer
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12