SIMULTANEOUS EDIT AND IMPUTATION FOR HOUSEHOLD DATA WITH STRUCTURAL ZEROS

被引:0
作者
Akande, Olanrewaju [1 ]
Barrientos, Andres [1 ]
Reiter, Jerome P. [2 ]
机构
[1] Duke Univ, Dept Stat Sci, POB 90251, Durham, NC 27708 USA
[2] Duke Univ, Stat Sci, Durham, NC 27708 USA
基金
美国国家科学基金会;
关键词
Categorical; Census; Latent; Measurement error; Missing; Mixture; DISCLOSURE LIMITATION;
D O I
10.1093/jssam/smy022
中图分类号
O1 [数学]; C [社会科学总论];
学科分类号
03 ; 0303 ; 0701 ; 070101 ;
摘要
Multivariate categorical data nested within households often include reported values that fail edit constraints-for example, a participating household reports a child's age as older than his biological parent's age-and have missing values. Generally, agencies prefer datasets to be free from erroneous or missing values before analyzing them or disseminating them to secondary data users. We present a model-based engine for editing and imputation of household data based on a Bayesian hierarchical model that includes (i) a nested data Dirichlet process mixture of products of multinomial distributions as the model for the true latent values of the data, truncated to allow only households that satisfy all edit constraints, (ii) a model for the location of errors, and (iii) a reporting model for the observed responses in error. The approach propagates uncertainty due to unknown locations of errors and missing values, generates plausible datasets that satisfy all edit constraints, and can preserve multivariate relationships within and across individuals in the same household. We illustrate the approach using data from the 2012 American Community Survey.
引用
收藏
页码:498 / 519
页数:22
相关论文
共 50 条
[41]   Combining information from two data sources with misreporting and incompleteness to assess hospice-use among cancer patients: a multiple imputation approach [J].
He, Yulei ;
Landrum, Mary Beth ;
Zaslavsky, Alan M. .
STATISTICS IN MEDICINE, 2014, 33 (21) :3710-3724
[42]   Improving retrospective data on recent household deaths: a multi-arm randomized trial in Guinea-Bissau [J].
Torrisi, Orsola ;
Fisker, Ane B. ;
Fernandes, Didier A. A. ;
Helleringer, Stephane .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2025, 54 (02)
[43]   You Are What You Watch and When You Watch: Inferring Household Structures From IPTV Viewing Data [J].
Luo, Dixin ;
Xu, Hongteng ;
Zha, Hongyuan ;
Du, Jun ;
Xie, Rong ;
Yang, Xiaokang ;
Zhang, Wenjun .
IEEE TRANSACTIONS ON BROADCASTING, 2014, 60 (01) :61-72
[44]   Simultaneous Inference and Bias Analysis for Longitudinal Data with Covariate Measurement Error and Missing Responses [J].
Yi, G. Y. ;
Liu, W. ;
Wu, Lang .
BIOMETRICS, 2011, 67 (01) :67-75
[45]   Detecting Prior-Data Disagreement in Bayesian Structural Equation Modeling [J].
Winter, Sonja D. ;
Depaoli, Sarah .
STRUCTURAL EQUATION MODELING-A MULTIDISCIPLINARY JOURNAL, 2022, 29 (06) :821-838
[46]   Simultaneous variable selection and parameters estimation for longitudinal data subject to missingness and covariates measurement error [J].
Basha, Heba A. ;
Abdrabou, Abdelnaser S. ;
Gad, Ahmed M. ;
Ibrahim, Wafaa I. M. .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024,
[47]   Extraction of the structural mode shapes utilizing image processing method and data fusion [J].
Havaran, Amin ;
Mahmoudi, Mussa ;
Ebrahimpour, Reza .
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2021, 151
[48]   Using Structural Equation Modeling to Study Traits and States in Intensive Longitudinal Data [J].
Castro-Alvarez, Sebastian ;
Tendeiro, Jorge N. ;
Meijer, Rob R. ;
Bringmann, Laura F. .
PSYCHOLOGICAL METHODS, 2022, 27 (01) :17-43
[49]   Treating ordinal data: a comparison between rating scale and structural equation models [J].
Golia, Silvia ;
Simonetto, Anna .
QUALITY & QUANTITY, 2015, 49 (03) :903-915
[50]   Standard multiple imputation of survey data didn't perform better than simple substitution in enhancing an administrative dataset: the example of self-rated health in England [J].
Popham, Frank ;
Whitley, Elise ;
Molaodi, Oarabile ;
Gray, Linsay .
EMERGING THEMES IN EPIDEMIOLOGY, 2021, 18 (01)