Joint modelling rationale for chained equations

被引:61
作者
Hughes, Rachael A. [1 ]
White, Ian R. [2 ]
Seaman, Shaun R. [2 ]
Carpenter, James R. [3 ,4 ]
Tilling, Kate [1 ]
Sterne, Jonathan A. C. [1 ]
机构
[1] Univ Bristol, Sch Social & Community Med, Bristol, Avon, England
[2] MRC, Inst Publ Hlth, Biostat Unit, Cambridge, England
[3] London Sch Hyg & Trop Med, London WC1, England
[4] MRC, Clin Trials Unit, London, England
基金
英国医学研究理事会;
关键词
Chained equations imputation; Gibbs sampling; Joint modelling imputation; Multiple imputation; Multivariate missing data; MULTIPLE IMPUTATION; CHECKING COMPATIBILITY; DISCRETE; DISTRIBUTIONS;
D O I
10.1186/1471-2288-14-28
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Chained equations imputation is widely used in medical research. It uses a set of conditional models, so is more flexible than joint modelling imputation for the imputation of different types of variables (e. g. binary, ordinal or unordered categorical). However, chained equations imputation does not correspond to drawing from a joint distribution when the conditional models are incompatible. Concurrently with our work, other authors have shown the equivalence of the two imputation methods in finite samples. Methods: Taking a different approach, we prove, in finite samples, sufficient conditions for chained equations and joint modelling to yield imputations from the same predictive distribution. Further, we apply this proof in four specific cases and conduct a simulation study which explores the consequences when the conditional models are compatible but the conditions otherwise are not satisfied. Results: We provide an additional "non-informative margins" condition which, together with compatibility, is sufficient. We show that the non-informative margins condition is not satisfied, despite compatible conditional models, in a situation as simple as two continuous variables and one binary variable. Our simulation study demonstrates that as a consequence of this violation order effects can occur; that is, systematic differences depending upon the ordering of the variables in the chained equations algorithm. However, the order effects appear to be small, especially when associations between variables are weak. Conclusions: Since chained equations is typically used in medical research for datasets with different types of variables, researchers must be aware that order effects are likely to be ubiquitous, but our results suggest they may be small enough to be negligible.
引用
收藏
页数:10
相关论文
共 39 条
[1]  
Albert J, 2009, USE R, P1, DOI 10.1007/978-0-387-92298-0_1
[2]  
[Anonymous], 2000, MULTIVARIATE IMPUTAT
[3]  
[Anonymous], 2000, SURV METHODOL
[4]   Compatibility of partial or complete conditional probability specifications [J].
Arnold, BC ;
Castillo, E ;
Sarabia, JM .
JOURNAL OF STATISTICAL PLANNING AND INFERENCE, 2004, 123 (01) :133-159
[5]  
Arnold BC, 2001, STAT SCI, V16, P249
[6]   COMPATIBLE CONDITIONAL DISTRIBUTIONS [J].
ARNOLD, BC ;
PRESS, SJ .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1989, 84 (405) :152-156
[7]   COLLAPSIBILITY AND RESPONSE VARIABLES IN CONTINGENCY-TABLES [J].
ASMUSSEN, S ;
EDWARDS, D .
BIOMETRIKA, 1983, 70 (03) :567-578
[8]   Multiple imputation by chained equations: what is it and how does it work? [J].
Azur, Melissa J. ;
Stuart, Elizabeth A. ;
Frangakis, Constantine ;
Leaf, Philip J. .
INTERNATIONAL JOURNAL OF METHODS IN PSYCHIATRIC RESEARCH, 2011, 20 (01) :40-49
[9]  
Besag J, 2001, STAT SCI, V16, P265
[10]   Compatibility of conditionally specified models [J].
Chen, Hua Yun .
STATISTICS & PROBABILITY LETTERS, 2010, 80 (7-8) :670-677