A distance-based rounding strategy for post-imputation ordinal data

被引:7
作者
Demirtas, Hakan [1 ]
机构
[1] Univ Illinois, Div Epidemiol & Biostat MC923, Chicago, IL 60612 USA
关键词
multiple imputation; rounding; bias; precision; ordinal data; PATTERN-MIXTURE MODELS; MULTIPLE IMPUTATION; MISSING-DATA; PERFORMANCE; ASSUMPTION; OUTCOMES; BIAS;
D O I
10.1080/02664760902744954
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
Multiple imputation has emerged as a widely used model-based approach in dealing with incomplete data in many application areas. Gaussian and log-linear imputation models are fairly straightforward to implement for continuous and discrete data, respectively. However, in missing data settings which include a mix of continuous and discrete variables, correct specification of the imputation model could be a daunting task owing to the lack of flexible models for the joint distribution of variables of different nature. This complication, along with accessibility to software packages that are capable of carrying out multiple imputation under the assumption of joint multivariate normality, appears to encourage applied researchers for pragmatically treating the discrete variables as continuous for imputation purposes, and subsequently rounding the imputed values to the nearest observed category. In this article, I introduce a distance-based rounding approach for ordinal variables in the presence of continuous ones. The first step of the proposed rounding process is predicated upon creating indicator variables that correspond to the ordinal levels, followed by jointly imputing all variables under the assumption of multivariate normality. The imputed values are then converted to the ordinal scale based on their Euclidean distances to a set of indicators, with minimal distance corresponding to the closest match. I compare the performance of this technique to crude rounding via commonly accepted accuracy and precision measures with simulated data sets.
引用
收藏
页码:489 / 500
页数:12
相关论文
共 27 条
[1]  
[Anonymous], 2007, R LANG ENV STAT COMP
[2]  
Belin TR, 1999, STAT MED, V18, P3123, DOI 10.1002/(SICI)1097-0258(19991130)18:22<3123::AID-SIM277>3.0.CO
[3]  
2-2
[4]   A comparison of inclusive and restrictive strategies in modern missing data procedures [J].
Collins, LM ;
Schafer, JL ;
Kam, CM .
PSYCHOLOGICAL METHODS, 2001, 6 (04) :330-351
[5]   Bayesian analysis of hierarchical pattern-mixture models for clinical trials data with attrition and comparisons to commonly used ad-hoc and model-based approaches [J].
Demirtas, H .
JOURNAL OF BIOPHARMACEUTICAL STATISTICS, 2005, 15 (03) :383-402
[6]   Multiple imputation under Bayesianly smoothed pattern-mixture models for non-ignorable drop-out [J].
Demirtas, H .
STATISTICS IN MEDICINE, 2005, 24 (15) :2345-2363
[7]   Simulation driven inferences for multiply imputed longitudinal datasets [J].
Demirtas, H .
STATISTICA NEERLANDICA, 2004, 58 (04) :466-482
[8]   On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out [J].
Demirtas, H ;
Schafer, JL .
STATISTICS IN MEDICINE, 2003, 22 (16) :2553-2575
[9]   Plausibility of multivariate normality assumption when multiply imputing non-Gaussian continuous outcomes: a simulation assessment [J].
Demirtas, Hakan ;
Freels, Sally A. ;
Yucel, Recai M. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2008, 78 (01) :69-84
[10]   Practical advice on how to impute continuous data when the ultimate interest centers on dichotomized outcomes through pre-specified thresholds [J].
Demirtas, Hakan .
COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2007, 36 (04) :871-889