Choosing Imputation Models

被引:3
作者
Marbach, Moritz [1 ]
机构
[1] Texas A&M Univ, Bush Sch Govt & Publ Serv, 4220 TAMU, College Stn, TX 77843 USA
基金
瑞士国家科学基金会;
关键词
missing data; imputation; weighting; MULTIPLE IMPUTATION; DISCRETE; BALANCE;
D O I
10.1017/pan.2021.39
中图分类号
D0 [政治学、政治理论];
学科分类号
0302 ; 030201 ;
摘要
Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.
引用
收藏
页码:597 / 605
页数:9
相关论文
共 25 条
  • [1] Diagnostics for multivariate imputations
    Abayomi, Kobi
    Gelman, Andrew
    Levy, Marc
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2008, 57 : 273 - 291
  • [2] A Review of Hot Deck Imputation for Survey Non-response
    Andridge, Rebecca R.
    Little, Roderick J. A.
    [J]. INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) : 40 - 64
  • [3] Graphical and numerical diagnostic tools to assess suitability of multiple imputations and imputation models
    Bondarenko, Irina
    Raghunathan, Trivellore
    [J]. STATISTICS IN MEDICINE, 2016, 35 (17) : 3007 - 3020
  • [4] We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data
    Cranmer, Skyler J.
    Gill, Jeff
    [J]. BRITISH JOURNAL OF POLITICAL SCIENCE, 2013, 43 : 425 - 449
  • [5] Recursive partitioning for missing data imputation in the presence of interaction effects
    Doove, L. L.
    Van Buuren, S.
    Dusseldorp, E.
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2014, 72 : 92 - 104
  • [6] Metrics for covariate balance in cohort studies of causal effects
    Franklin, Jessica M.
    Rassen, Jeremy A.
    Ackermann, Diana
    Bartels, Dorothee B.
    Schneeweiss, Sebastian
    [J]. STATISTICS IN MEDICINE, 2014, 33 (10) : 1685 - 1699
  • [7] Entropy Balancing for Causal Effects: A Multivariate Reweighting Method to Produce Balanced Samples in Observational Studies
    Hainmueller, Jens
    [J]. POLITICAL ANALYSIS, 2012, 20 (01) : 25 - 46
  • [8] Honaker J, 2011, J STAT SOFTW, V45, P1
  • [9] Analyzing incomplete political science data: An alternative algorithm for multiple imputation
    King, G
    Honaker, J
    Joseph, A
    Scheve, K
    [J]. AMERICAN POLITICAL SCIENCE REVIEW, 2001, 95 (01) : 49 - 69
  • [10] KROPKO J, 2014, POLIT ANAL, V0022