A semi-parametric approach to impute mixed continuous and categorical data

被引:1
|
作者
Helenowski I.B. [1 ]
Demirtas H. [2 ]
McGee M.F. [3 ]
机构
[1] Department of Preventive Medicine, Feinberg School of Medicine, Northwestern University, Chicago, IL
[2] Division of Epidemiology & Biostatistics, School of Public Health, University of Illinois–Chicago, Chicago, IL
[3] Department of Surgery, Feinberg School of Medicine, Northwestern University, Chicago, IL
关键词
Categorical data; Multiple imputation; Ordinal data; Semi-parametric;
D O I
10.1007/s10742-014-0127-8
中图分类号
学科分类号
摘要
We propose an extension of the method presented in Helenowski and Demirtas (2013) involving imputing mixed continuous and binary data to data involving categorical variables with three or more levels. In a bivariate case, the medians for the continuous variable will be computed by each level of the categorical variable and the categorical variable will be ranked as an ordinal variable with respect to these medians, so that each ordinal level assigned to a categorical level is determined by the rank order of medians of the continuous variable for that category. In a multivariate case, the categorical variables are ordered with respect to the continuous variable for which the range among the medians is the largest. Here, ‘bivariate’ indicates that the data set includes two variables while ‘multivariate’ indicates that the data set includes three or more variables. The pairwise correlation between the continuous and ordinal variable is then computed. Data will then be transformed to normally distributed values, imputed via joint modeling, and back-transformed to the original scale via the Barton and Schruben (1993) technique for the continuous variable and quantiles based on the original probabilities of the categorical variable. The algorithm is re-iterated until the absolute difference of the pairwise correlations from the original and imputed data is less than some constant c chosen to maximize the coverage rate and minimize standardized bias. Results from simulations applied to artificial data and to real data involving 74 colorectal patients indicate that our technique as promising. © 2014, Springer Science+Business Media New York.
引用
收藏
页码:183 / 193
页数:10
相关论文
共 50 条
  • [1] A semi-parametric approach for imputing mixed data
    Helenowski, Irene B.
    Demirtas, Hakan
    STATISTICS AND ITS INTERFACE, 2013, 6 (03) : 399 - 412
  • [2] Semi-parametric Dynamic Models for Longitudinal Ordinal Categorical Data
    Sutradhar, Brajendra C.
    SANKHYA-SERIES A-MATHEMATICAL STATISTICS AND PROBABILITY, 2018, 80 (01): : 80 - 109
  • [3] A semi-parametric Bayesian model for semi-continuous longitudinal data
    Ren, Junting
    Tapert, Susan
    Fan, Chun Chieh
    Thompson, Wesley K.
    STATISTICS IN MEDICINE, 2022, 41 (13) : 2354 - 2374
  • [4] Learning from Biased Data: A Semi-Parametric Approach
    Bertail, Patrice
    Clemencon, Stephan
    Guyonvarch, Yannick
    Noiry, Nathan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] A semi-parametric Bayesian approach to generalized linear mixed models
    Kleinman, KP
    Ibrahim, JG
    STATISTICS IN MEDICINE, 1998, 17 (22) : 2579 - 2596
  • [6] Inference in semi-parametric spline mixed models for longitudinal data
    Sinha S.K.
    Sattar A.
    METRON, 2015, 73 (3) : 377 - 395
  • [7] Inferences in semi-parametric dynamic mixed models for longitudinal count data
    Zheng, Nan
    Sutradhar, Brajendra C.
    ANNALS OF THE INSTITUTE OF STATISTICAL MATHEMATICS, 2018, 70 (01) : 215 - 247
  • [8] Inferences in semi-parametric dynamic mixed models for longitudinal count data
    Nan Zheng
    Brajendra C. Sutradhar
    Annals of the Institute of Statistical Mathematics, 2018, 70 : 215 - 247
  • [9] A semi-parametric approach to risk management
    Bingham, NH
    Kiesel, R
    Schmidt, R
    QUANTITATIVE FINANCE, 2003, 3 (06) : 426 - 441
  • [10] Absolutely Continuous Semi-parametric Bivariate Distributions
    Samanta, Debashis
    Kundu, Debasis
    SANKHYA-SERIES B-APPLIED AND INTERDISCIPLINARY STATISTICS, 2025,