An Empirical Comparison of Multiple Imputation Methods for Categorical Data

被引:44
|
作者
Akande, Olanrewaju [1 ]
Li, Fan [1 ]
Reiter, Jerome [1 ]
机构
[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA
来源
AMERICAN STATISTICIAN | 2017年 / 71卷 / 02期
基金
美国国家科学基金会;
关键词
Latent; Missing; Mixture; Nonresponse; Tree; IMPLEMENTATION;
D O I
10.1080/00031305.2016.1277158
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.
引用
收藏
页码:162 / 170
页数:9
相关论文
共 50 条
  • [1] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
    Karangwa, Innocent
    Kotze, Danelle
    Blignaut, Renette
    BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
  • [2] Benchmarking imputation methods for categorical biological data
    Gendre, Matthieu
    Hauffe, Torsten
    Pimiento, Catalina
    Silvestro, Daniele
    METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (09): : 1624 - 1638
  • [3] MULTIPLE IMPUTATION FOR CATEGORICAL VARIABLES IN MULTILEVEL DATA
    Kottage, Helani Dilshara
    BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2022, 106 (02) : 349 - 350
  • [4] A nonparametric multiple imputation approach for missing categorical data
    Zhou, Muhan
    He, Yulei
    Yu, Mandi
    Hsu, Chiu-Hsieh
    BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
  • [5] A nonparametric multiple imputation approach for missing categorical data
    Muhan Zhou
    Yulei He
    Mandi Yu
    Chiu-Hsieh Hsu
    BMC Medical Research Methodology, 17
  • [6] Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
    Pan, Steven
    Chen, Sixia
    INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)
  • [7] A comparison of multiple imputation methods for missing data in longitudinal studies
    Md Hamidul Huque
    John B. Carlin
    Julie A. Simpson
    Katherine J. Lee
    BMC Medical Research Methodology, 18
  • [8] A comparison of multiple imputation methods for missing data in longitudinal studies
    Huque, Md Hamidul
    Carlin, John B.
    Simpson, Julie A.
    Lee, Katherine J.
    BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
  • [9] A comparison of multiple imputation methods for incomplete longitudinal binary data
    Yamaguchi, Yusuke
    Misumi, Toshihiro
    Maruo, Kazushi
    JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (04) : 645 - 667
  • [10] A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns
    Solaro, N.
    Barbiero, A.
    Manzi, G.
    Ferrari, P. A.
    JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (18) : 3588 - 3619