SIMULTANEOUS EDIT AND IMPUTATION FOR HOUSEHOLD DATA WITH STRUCTURAL ZEROS

被引:0
作者
Akande, Olanrewaju [1 ]
Barrientos, Andres [1 ]
Reiter, Jerome P. [2 ]
机构
[1] Duke Univ, Dept Stat Sci, POB 90251, Durham, NC 27708 USA
[2] Duke Univ, Stat Sci, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
Categorical; Census; Latent; Measurement error; Missing; Mixture; DISCLOSURE LIMITATION;
D O I
10.1093/jssam/smy022
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Multivariate categorical data nested within households often include reported values that fail edit constraints-for example, a participating household reports a child's age as older than his biological parent's age-and have missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.
引用
收藏
页码:498 / 519
页数:22
相关论文
共 50 条
[31]   Missing in space: an evaluation of imputation methods for missing data in spatial analysis of risk factors for type II diabetes [J].
Jannah Baker ;
Nicole White ;
Kerrie Mengersen .
International Journal of Health Geographics, 13
[32]   EXAMINING THE RELATIONSHIP BETWEEN NONRESPONSE PROPENSITY AND DATA QUALITY IN TWO NATIONAL HOUSEHOLD SURVEYS [J].
Fricker, Scott ;
Tourangeau, Roger .
PUBLIC OPINION QUARTERLY, 2010, 74 (05) :934-955
[33]   Multiple imputation of completely missing repeated measures data within person from a complex sample: application to accelerometer data in the National Health and Nutrition Examination Survey [J].
Liu, Benmei ;
Yu, Mandi ;
Graubard, Barry I. ;
Troiano, Richard P. ;
Schenker, Nathaniel .
STATISTICS IN MEDICINE, 2016, 35 (28) :5170-5188
[34]   Clustering of fuzzy data and simultaneous feature selection: A model selection approach [J].
Saha, Arkajyoti ;
Das, Swagatam .
FUZZY SETS AND SYSTEMS, 2018, 340 :1-37
[35]   Upscaling Household Survey Data Using Remote Sensing to Map Socioeconomic Groups in Kampala, Uganda [J].
Hemerijckx, Lisa-Marie ;
Van Emelen, Sam ;
Rymenants, Joachim ;
Davis, Jac ;
Verburg, Peter H. ;
Lwasa, Shuaib ;
Van Rompaey, Anton .
REMOTE SENSING, 2020, 12 (20) :1-21
[36]   Complex Data Imputation by Auto-Encoders and Convolutional Neural Networks-A Case Study on Genome Gap-Filling [J].
Cappelletti, Luca ;
Fontana, Tommaso ;
Di Donato, Guido Walter ;
Di Tucci, Lorenzo ;
Casiraghi, Elena ;
Valentini, Giorgio .
COMPUTERS, 2020, 9 (02)
[37]   Survival analysis with time-dependent covariates subject to missing data or measurement error: Multiple Imputation for Joint Modeling (MIJM) [J].
Moreno-Betancur, Margarita ;
Carlin, John B. ;
Brilleman, Samuel L. ;
Tanamas, Stephanie K. ;
Peeters, Anna ;
Wolfe, Rory .
BIOSTATISTICS, 2018, 19 (04) :479-496
[38]   Validation of spatiodemographic estimates produced through data fusion of small area census records and household microdata [J].
Rose, Amy N. ;
Nagle, Nicholas N. .
COMPUTERS ENVIRONMENT AND URBAN SYSTEMS, 2017, 63 :38-49
[39]   Estimates of Year-to-Year Volatility in Earnings and in Household Incomes from Administrative, Survey, and Matched Data [J].
Dahl, Molly ;
De Leire, Thomas ;
Schwabish, Jonathan A. .
JOURNAL OF HUMAN RESOURCES, 2011, 46 (04) :750-774
[40]   Structural inference in transition measurement error models for longitudinal data [J].
Pan, WQ ;
Lin, XH ;
Zeng, DL .
BIOMETRICS, 2006, 62 (02) :402-412