Multiple imputation using auxiliary imputation variables that only predict missingness can increase bias due to data missing not at random

被引：2

作者：

Curnow, Elinor ^{[1
,2
]}

Cornish, Rosie P. ^{[1
,2
]}

Heron, Jon E. ^{[1
,2
]}

Carpenter, James R. ^{[3
,4
]}

Tilling, Kate ^{[1
,2
]}

机构：

[1] Univ Bristol, Bristol Med Sch, Dept Populat Hlth Sci, Bristol, England

[2] Univ Bristol, Med Res Council, Integrat Epidemiol Unit, Bristol, England

[3] Univ London London Sch Hyg & Trop Med, Dept Med Stat, London, England

[4] UCL, MRC, Clin Trials Unit, London, England

来源：

BMC MEDICAL RESEARCH METHODOLOGY | 2024年 / 24卷 / 01期

基金：

英国医学研究理事会; 英国惠康基金;

关键词：

Missing data; Multiple imputation; Bias amplification; Auxiliary variable; ALSPAC;

D O I：

10.1186/s12874-024-02353-9

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

BackgroundEpidemiological and clinical studies often have missing data, frequently analysed using multiple imputation (MI). In general, MI estimates will be biased if data are missing not at random (MNAR). Bias due to data MNAR can be reduced by including other variables ("auxiliary variables") in imputation models, in addition to those required for the substantive analysis. Common advice is to take an inclusive approach to auxiliary variable selection (i.e. include all variables thought to be predictive of missingness and/or the missing values). There are no clear guidelines about the impact of this strategy when data may be MNAR.MethodsWe explore the impact of including an auxiliary variable predictive of missingness but, in truth, unrelated to the partially observed variable, when data are MNAR. We quantify, algebraically and by simulation, the magnitude of the additional bias of the MI estimator for the exposure coefficient (fitting either a linear or logistic regression model), when the (continuous or binary) partially observed variable is either the analysis outcome or the exposure. Here, "additional bias" refers to the difference in magnitude of the MI estimator when the imputation model includes (i) the auxiliary variable and the other analysis model variables; (ii) just the other analysis model variables, noting that both will be biased due to data MNAR. We illustrate the extent of this additional bias by re-analysing data from a birth cohort study.ResultsThe additional bias can be relatively large when the outcome is partially observed and missingness is caused by the outcome itself, and even larger if missingness is caused by both the outcome and the exposure (when either the outcome or exposure is partially observed).ConclusionsWhen using MI, the na & iuml;ve and commonly used strategy of including all available auxiliary variables should be avoided. We recommend including the variables most predictive of the partially observed variable as auxiliary variables, where these can be identified through consideration of the plausible casual diagrams and missingness mechanisms, as well as data exploration (noting that associations with the partially observed variable in the complete records may be distorted due to selection bias).

引用

页数：15

共 50 条

[1] Auxiliary Variables in Multiple Imputation When Data Are Missing Not at Random
Mustillo, Sarah
Kwon, Soyoung
JOURNAL OF MATHEMATICAL SOCIOLOGY, 2015, 39 (02): : 73 - 91
[2] Analyses using multiple imputation need to consider missing data in auxiliary variables
Madley-Dowd, Paul
Curnow, Elinor
Hughes, Rachael A.
Cornish, Rosie P.
Tilling, Kate
Heron, Jon
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2025,
[3] Improved Imputation of Missing Pavement Performance Data Using Auxiliary Variables
Farhan, J.
Fwa, T. F.
JOURNAL OF TRANSPORTATION ENGINEERING, 2015, 141 (01)
[4] Multiple imputation of ordinal missing not at random data
Hammon, Angelina
ASTA-ADVANCES IN STATISTICAL ANALYSIS, 2023, 107 (04) : 671 - 692
[5] Multiple imputation of ordinal missing not at random data
Angelina Hammon
AStA Advances in Statistical Analysis, 2023, 107 : 671 - 692
[6] The Effects of Auxiliary Variables on Coefficient Bias and Efficiency in Multiple Imputation
Mustillo, Sarah
SOCIOLOGICAL METHODS & RESEARCH, 2012, 41 (02) : 335 - 361
[7] Multiple imputation of missing data under missing at random: compatible imputation models are not sufficient to avoid bias if they are mis-specified
Curnow, Elinor
Capenter, James R.
Heron, Jon E.
Cornish, Rosie P.
Rach, Stefan
Didelez, Vanessa
Langeheine, Malte
Tilling, Kate
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2023, 160 : 100 - 109
[8] Multiple imputation for interval censored data with auxiliary variables
Hsu, Chiu-Hsieh
Taylor, Jeremy M. G.
Murray, Susan
Commenges, Daniel
STATISTICS IN MEDICINE, 2007, 26 (04) : 769 - 781
[9] Multiple imputation of binary multilevel missing not at random data
Hammon, Angelina
Zinn, Sabine
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2020, 69 (03) : 547 - 564
[10] A Causal View on Bias in Missing Data Imputation: The Impact of Evil Auxiliary Variables on Norming of Test Scores
Sengewald, Erik
Hardt, Katinka
Sengewald, Marie-Ann
MULTIVARIATE BEHAVIORAL RESEARCH, 2024,

← 1 2 3 4 5 →