An Empirical Comparison of Multiple Imputation Methods for Categorical Data

被引：44

作者：

Akande, Olanrewaju ^{[1
]}

Li, Fan ^{[1
]}

Reiter, Jerome ^{[1
]}

机构：

[1] Duke Univ, Dept Stat Sci, Durham, NC 27708 USA

来源：

AMERICAN STATISTICIAN | 2017年 / 71卷 / 02期

基金：

美国国家科学基金会;

关键词：

Latent; Missing; Mixture; Nonresponse; Tree; IMPLEMENTATION;

D O I：

10.1080/00031305.2016.1277158

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Multiple imputation is a common approach for dealing with missing values in statistical databases. The imputer fills in missing values with draws from predictive models estimated from the observed data, resulting in multiple, completed versions of the database. Researchers have developed a variety of default routines to implement multiple imputation; however, there has been limited research comparing the performance of these methods, particularly for categorical data. We use simulation studies to compare repeated sampling properties of three default multiple imputation methods for categorical data, including chained equations using generalized linear models, chained equations using classification and regression trees, and a fully Bayesian joint distribution based on Dirichlet process mixture models. We base the simulations on categorical data from the American Community Survey. In the circumstances of this study, the results suggest that default chained equations approaches based on generalized linear models are dominated by the default regression tree and Bayesian mixture model approaches. They also suggest competing advantages for the regression tree and Bayesian mixture model approaches, making both reasonable default engines for multiple imputation of categorical data. Supplementary material for this article is available online.

引用

页码：162 / 170

页数：9

共 50 条

[1] Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
Karangwa, Innocent
Kotze, Danelle
Blignaut, Renette
BRAZILIAN JOURNAL OF PROBABILITY AND STATISTICS, 2016, 30 (04) : 521 - 539
[2] Benchmarking imputation methods for categorical biological data
Gendre, Matthieu
Hauffe, Torsten
Pimiento, Catalina
Silvestro, Daniele
METHODS IN ECOLOGY AND EVOLUTION, 2024, 15 (09): : 1624 - 1638
[3] MULTIPLE IMPUTATION FOR CATEGORICAL VARIABLES IN MULTILEVEL DATA
Kottage, Helani Dilshara
BULLETIN OF THE AUSTRALIAN MATHEMATICAL SOCIETY, 2022, 106 (02) : 349 - 350
[4] A nonparametric multiple imputation approach for missing categorical data
Zhou, Muhan
He, Yulei
Yu, Mandi
Hsu, Chiu-Hsieh
BMC MEDICAL RESEARCH METHODOLOGY, 2017, 17
[5] A nonparametric multiple imputation approach for missing categorical data
Muhan Zhou
Yulei He
Mandi Yu
Chiu-Hsieh Hsu
BMC Medical Research Methodology, 17
[6] Empirical Comparison of Imputation Methods for Multivariate Missing Data in Public Health
Pan, Steven
Chen, Sixia
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2023, 20 (02)
[7] A comparison of multiple imputation methods for missing data in longitudinal studies
Md Hamidul Huque
John B. Carlin
Julie A. Simpson
Katherine J. Lee
BMC Medical Research Methodology, 18
[8] A comparison of multiple imputation methods for missing data in longitudinal studies
Huque, Md Hamidul
Carlin, John B.
Simpson, Julie A.
Lee, Katherine J.
BMC MEDICAL RESEARCH METHODOLOGY, 2018, 18
[9] A comparison of multiple imputation methods for incomplete longitudinal binary data
Yamaguchi, Yusuke
Misumi, Toshihiro
Maruo, Kazushi
JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2018, 28 (04) : 645 - 667
[10] A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns
Solaro, N.
Barbiero, A.
Manzi, G.
Ferrari, P. A.
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2018, 88 (18) : 3588 - 3619

← 1 2 3 4 5 →