FINDING A FLEXIBLE HOT-DECK IMPUTATION METHOD FOR MULTINOMIAL DATA

被引:5
作者
Andridge, Rebecca [1 ]
Bechtel, Laura [2 ]
Thompson, Katherine Jenny [2 ]
机构
[1] Ohio State Univ, 1841 Neil Ave, Columbus, OH 43210 USA
[2] US Census Bur, 4600 Silver Hill Rd, Washington, DC 20233 USA
关键词
MULTIPLE IMPUTATION;
D O I
10.1093/jssam/smaa005
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Detailed breakdowns on totals are often collected in surveys, such as a breakdown of total product sales by product type. These multinomial data are often sparsely reported with wide variability in proportions across units. In addition, there are often true zeros that differ across units even within industry; for example, one establishment sells jeans but not shoes, and another sells shoes but not socks. It is quite common to have large fractions of missing data for these detailed items, even when totals are relatively completely observed. Hot-deck imputation, which fills in missing data with observed data values, is an attractive approach. The entire set of proportions can be simultaneously imputed to preserve multinomial distributions, and zero values can be imputed. However, it is not clear what variant of the hot deck is best. We describe a large set of "flavors" of the hot deck and compare them through simulation and by application to data from the 2012 Economic Census. We consider different ways to create the donor pool: choosing one nearest neighbor (NN), choosing from five NNs, or using all units as the donor pool. We also consider different ways to impute from the donor: directly impute the donor's vector of proportions or randomly draw from a multinomial distribution using this vector of proportions. We consider scenarios where a strong predictor of these multinomial distributions exists as well as when covariate information is weak.
引用
收藏
页码:789 / 809
页数:21
相关论文
共 14 条
[1]   A Review of Hot Deck Imputation for Survey Non-response [J].
Andridge, Rebecca R. ;
Little, Roderick J. A. .
INTERNATIONAL STATISTICAL REVIEW, 2010, 78 (01) :40-64
[2]  
Andridge RR, 2009, J OFF STAT, V25, P21
[3]   Variance estimation when donor imputation is used to fill in missing values [J].
Beaumont, Jean-Francois ;
Bocci, Cynthia .
CANADIAN JOURNAL OF STATISTICS-REVUE CANADIENNE DE STATISTIQUE, 2009, 37 (03) :400-416
[4]   How to Obtain Valid Inference under Unit Nonresponse? [J].
Boeschoten, Laura ;
Vink, Gerko ;
Hox, Joop J. C. M. .
JOURNAL OF OFFICIAL STATISTICS, 2017, 33 (04) :963-978
[5]  
Davie W. C., 2018, P FCSM RES C
[6]  
Ellis Y., 2015, P SECT SURV RES METH
[7]  
Fink E. B., 2015, P FCSM RES C
[8]   Tuning multiple imputation by predictive mean matching and local residual draws [J].
Morris, Tim P. ;
White, Ian R. ;
Royston, Patrick .
BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14
[9]  
Rubin D., 2009, Multiple imputation for nonresponse in surveys
[10]   MULTIPLE IMPUTATION FOR INTERVAL ESTIMATION FROM SIMPLE RANDOM SAMPLES WITH IGNORABLE NONRESPONSE [J].
RUBIN, DB ;
SCHENKER, N .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1986, 81 (394) :366-374