Evaluating a sequential tree-based procedure for multivariate imputation of complex missing data structures

被引:0
作者
Riccardo Borgoni
Ann Berrington
机构
[1] University of Milano-Bicocca,Department of Statistics
[2] University of Southampton,Division of Social Statistics, Southampton Statistical Sciences Research Institute
来源
Quality & Quantity | 2013年 / 47卷
关键词
Missing data; Sequential imputation; Classification tree; 1970 British Birth Cohort;
D O I
暂无
中图分类号
学科分类号
摘要
Item nonresponse in survey data can pose significant problems for social scientists carrying out statistical modeling using a large number of explanatory variables. A number of imputation methods exist but many only deal with univariate imputation, or relatively simple cases of multivariate imputation, often assuming a monotone pattern of missingness. In this paper we evaluate a tree-based approach for multivariate imputation using real data from the 1970 British Cohort Study, known for its complex pattern of nonresponse. The performance of this tree-based approach is compared to mode imputation and a sequential regression based approach within a simulation study.
引用
收藏
页码:1991 / 2008
页数:17
相关论文
共 25 条
[1]  
Allison P.D.(2000)Multiple imputation for missing data: a cautionary tale Sociol. Methods Res. 28 301-309
[2]  
Chambers R.(2001)Evaluation criteria for statistical editing and imputation Natl. Stat. Methodol. Ser. 28 1-41
[3]  
Dempster A.P.(1977)Maximum likelihood for incomplete data via the EM algorithm (with discussion) J. R. Stat. Soc. B. 39 1-38
[4]  
Laird N.M.(2009)Imputation methods for handling item-nonresponse in practice: methodological issues and recent debates Int. J. Soc. Res. Methodol. 12 293-304
[5]  
Rubin D.B.(1999)Influences on women’s smoking status—the contribution of socioeconomic status in adolescence and adulthood Eur. J. Public Health 9 137-141
[6]  
Durrant G.B.(2001)Multiple imputation in practice: comparison of software packages for regression models with missing variables Am. Stat. 55 244-254
[7]  
Graham H.(2004)Effects of childhood socioeconomic circumstances on persistent smoking Am. J. Public Health 94 279-285
[8]  
Der G.(2004)New way of specifying data edit J. R. Stat. Soc. A. 167 249-274
[9]  
Horton N.J.(2001)A multivariate technique for multiply imputing missing values using a sequence of regression models Surv. Methodol. 27 85-95
[10]  
Lipsitz S.R.(1955)A test for homogeneity of the marginal distributions in a two-way classification Biometrika 42 412-416