Imputing cross-sectional missing data: comparison of common techniques

被引:171
作者
Hawthorne, G [1 ]
Elliott, P [1 ]
机构
[1] Univ Melbourne, Dept Psychiat, Australian Ctr Posttraumat Mental Hlth, Heidelberg, Vic 3081, Australia
关键词
horizontal mean substitution; hot deck imputation; imputation; item mean substitution; listwise deletion; missing data; person mean substitution; regression imputation; vertical mean substitution;
D O I
10.1111/j.1440-1614.2005.01630.x
中图分类号
R749 [精神病学];
学科分类号
100205 ;
摘要
Objective: Increasing awareness of how missing data affects the analysis of clinical and public health interventions has led to increasing numbers of missing data procedures. There is little advice regarding which procedures should be selected under different circumstances. This paper compares six popular procedures: listwise deletion, item mean substitution, person mean substitution at two levels, regression imputation and hot deck imputation. Method: Using a complete dataset, each was examined under a variety of sample sizes and differing levels of missing data. The criteria were the true t-values for the entire sample. Results: The results suggest important differences. If missing data are from a scale where about half the items are present, hot deck imputation or person mean substitution are best. Because person mean substitution is computationally simpler, similar in its efficiency, advocated by other researchers and more likely to be an option on statistical software packages, it is the method of choice. If the missing data are from a scale where more than half the items are missing, or with single-item measures, then hot deck imputation is recommended. The findings also showed that listwise deletion and item mean substitution performed poorly. Conclusions: Person mean and hot deck imputation are preferred. Since listwise deletion and item mean substitution performed poorly, yet are the most widely reported methods, the findings have broad implications.
引用
收藏
页码:583 / 590
页数:8
相关论文
共 38 条