Principled missing data methods for researchers

被引:1486
作者
Dong, Yiran [1 ]
Peng, Chao-Ying Joanne [1 ]
机构
[1] Indiana Univ, Bloomington, IN 47405 USA
来源
SPRINGERPLUS | 2013年 / 2卷
关键词
Missing data; Listwise deletion; MI; Gamma IML; EM; MAR; MCAR; MNAR; MULTIPLE IMPUTATION; MAXIMUM-LIKELIHOOD; CHAINED EQUATIONS; PERFORMANCE; SOFTWARE; UPDATE; VALUES; STATE;
D O I
10.1186/2193-1801-2-222
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The impact of missing data on quantitative research can be serious, leading to biased estimates of parameters, loss of information, decreased statistical power, increased standard errors, and weakened generalizability of findings. In this paper, we discussed and demonstrated three principled missing data methods: multiple imputation, full information maximum likelihood, and expectation-maximization algorithm, applied to a real-world data set. Results were contrasted with those obtained from the complete data set and from the listwise deletion method. The relative merits of each method are noted, along with common features they share. The paper concludes with an emphasis on the importance of statistical assumptions, and recommendations for researchers. Quality of research will be enhanced if (a) researchers explicitly acknowledge missing data problems and the conditions under which they occurred, (b) principled methods are employed to handle missing data, and (c) the appropriate treatment of missing data is incorporated into review standards of manuscripts submitted for publication.
引用
收藏
页码:1 / 17
页数:17
相关论文
共 71 条
[11]  
[Anonymous], 2009, SPSS, DOI DOI 10.1787/9789264056275-EN
[12]   Small-sample degrees of freedom with multiple imputation [J].
Barnard, J ;
Rubin, DB .
BIOMETRIKA, 1999, 86 (04) :948-955
[13]  
Bennett DA, 2001, AUST NZ J PUBL HEAL, V25, P464, DOI 10.1111/j.1467-842X.2001.tb00294.x
[14]   Robustness of a multivariate normal approximation for imputation of incomplete binary data [J].
Bernaards, Coen A. ;
Belin, Thomas R. ;
Schafer, Joseph L. .
STATISTICS IN MEDICINE, 2007, 26 (06) :1368-1382
[15]  
Carpenter J., 2004, Multilevel modelling newsletter, V16, P9
[16]  
Carpenter JR, 2011, J STAT SOFTW, V45, P1
[17]   A comparison of inclusive and restrictive strategies in modern missing data procedures [J].
Collins, LM ;
Schafer, JL ;
Kam, CM .
PSYCHOLOGICAL METHODS, 2001, 6 (04) :330-351
[18]   Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment [J].
Demirtas, Hakan ;
Freels, Sally A. ;
Yucel, Recai M. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2008, 78 (01) :69-84
[19]   MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM [J].
DEMPSTER, AP ;
LAIRD, NM ;
RUBIN, DB .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01) :1-38
[20]  
Diggle P.J., 1995, ANAL LONGITUDINAL DA