Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations

被引:1
|
作者
Karangwa, Innocent [1 ]
Kotze, Danelle [1 ]
Blignaut, Renette [1 ]
机构
[1] Univ Western Cape, Dept Stat & Populat Studies, Private Bag X17, ZA-7535 Bellville, South Africa
关键词
Missing data; missing at random; multiple imputation; multivariate normal imputation; multiple imputation by chained equations; categorical data; FULLY CONDITIONAL SPECIFICATION; SIMULATION; OUTCOMES; WORK;
D O I
10.1214/15-BJPS292
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of them may negatively affect the inferences drawn. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imptitation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to deal with missing data. The former assumes a normal distribution of the variables in the imputation model and the latter fills in missing values taking into account the distributional form of the variables to be imputed. This study examines the performance of these methods when data are missing at random on unordered categorical variables treated as predictors in the regression models. First, a survey data set with no missing values is used to generate a data set with missing at random observations on unordered categorical variables. Then, the two methods are separately used to impute the missing values of the generated data set. Their performance is compared in terms of bias and standard errors of the estimates from the regression models that determine the association between the woman's contraceptive methods use status and her marital status, controlling for the region of origin. The baseline data used is the 2007 Demographic and Health Survey (DHS) data set from the Democratic Republic of Congo. The findings indicate that although the MVNI relies on the statistical parametric theory, it produces more accurate estimates than MICE for nonordered categorical variables.
引用
收藏
页码:521 / 539
页数:19
相关论文
共 50 条
  • [1] Multiple Imputation for Missing Data: Fully Conditional Specification Versus Multivariate Normal Imputation
    Lee, Katherine J.
    Carlin, John B.
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 171 (05) : 624 - 632
  • [2] Multiple imputation with multivariate imputation by chained equation (MICE) package
    Zhang, Zhongheng
    ANNALS OF TRANSLATIONAL MEDICINE, 2016, 4 (02)
  • [3] Multiple imputation by chained equations for systematically and sporadically missing multilevel data
    Resche-Rigon, Matthieu
    White, Ian R.
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2018, 27 (06) : 1634 - 1649
  • [4] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [5] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    BMC Medical Research Methodology, 17
  • [6] Multiple imputation using chained equations for missing data in TIMSS: a case study
    Bouhlila D.S.
    Sellaouti F.
    Large-scale Assessments in Education, 1 (1)
  • [7] Missing Data and Multiple Imputation
    Cummings, Peter
    JAMA PEDIATRICS, 2013, 167 (07) : 656 - 661
  • [8] Multiple imputation for missing data
    Patrician, PA
    RESEARCH IN NURSING & HEALTH, 2002, 25 (01) : 76 - 84
  • [9] Multiple imputation of missing data
    Lydersen, Stian
    TIDSSKRIFT FOR DEN NORSKE LAEGEFORENING, 2022, 142 (02) : 151 - 151
  • [10] Multilevel Multiple Imputation: A Review and Evaluation of Joint Modeling and Chained Equations Imputation
    Enders, Craig K.
    Mistler, Stephen A.
    Keller, Brian T.
    PSYCHOLOGICAL METHODS, 2016, 21 (02) : 222 - 240