Recursive partitioning for missing data imputation in the presence of interaction effects

被引:171
作者
Doove, L. L. [1 ,2 ]
Van Buuren, S. [1 ,3 ]
Dusseldorp, E. [2 ,3 ]
机构
[1] Univ Utrecht, Fac Social Sci, Dept Methodol & Stat, NL-3508 TC Utrecht, Netherlands
[2] Katholieke Univ Leuven, Dept Psychol, Louvain, Belgium
[3] TNO, Netherlands Org Appl Sci Res, NL-2301 CE Leiden, Netherlands
关键词
CART; Classification and regression trees; Interaction problem; MICE; Nonlinear relations; Random forests; MULTIPLE IMPUTATION; REGRESSION TREES;
D O I
10.1016/j.csda.2013.10.025
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Standard approaches to implement multiple imputation do not automatically incorporate nonlinear relations like interaction effects. This leads to biased parameter estimates when interactions are present in a dataset. With the aim of providing an imputation method which preserves interactions in the data automatically, the use of recursive partitioning as imputation method is examined. Three recursive partitioning techniques are implemented in the multiple imputation by chained equations framework. It is investigated, using simulated data, whether recursive partitioning creates appropriate variability between imputations and unbiased parameter estimates with appropriate confidence intervals. It is concluded that, when interaction effects are present in a dataset, substantial gains are possible by using recursive partitioning for imputation compared to standard applications. In addition, it is shown that the potential of recursive partitioning imputation approaches depends on the relevance of a possible interaction effect, the correlation structure of the data, and the type of possible interaction effect present in the data. (C) 2013 Elsevier B.V. All rights reserved.
引用
收藏
页码:92 / 104
页数:13
相关论文
共 34 条
[1]  
Aiken LS., 1991, MULTIPLE REGRESSION
[2]  
[Anonymous], J DATA SCI
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]  
Breiman L., 1984, Classification and regression trees, DOI DOI 10.1201/9781315139470
[5]   Multiple Imputation for Missing Data via Sequential Regression Trees [J].
Burgette, Lane F. ;
Reiter, Jerome P. .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2010, 172 (09) :1070-1076
[6]  
Cohen J., 1988, Statistical power analysis for the behavioral sciences, VSecond
[7]   A comparison of inclusive and restrictive strategies in modern missing data procedures [J].
Collins, LM ;
Schafer, JL ;
Kam, CM .
PSYCHOLOGICAL METHODS, 2001, 6 (04) :330-351
[8]   Combining an Additive and Tree-Based Regression Model Simultaneously: STIMA [J].
Dusseldorp, Elise ;
Conversano, Claudio ;
Van Os, Bart Jan .
JOURNAL OF COMPUTATIONAL AND GRAPHICAL STATISTICS, 2010, 19 (03) :514-530
[9]   MULTIVARIATE ADAPTIVE REGRESSION SPLINES [J].
FRIEDMAN, JH .
ANNALS OF STATISTICS, 1991, 19 (01) :1-67
[10]   How many imputations are really needed? - Some practical clarifications of multiple imputation theory [J].
Graham, John W. ;
Olchowski, Allison E. ;
Gilreath, Tamika D. .
PREVENTION SCIENCE, 2007, 8 (03) :206-213