A Novel Imputation Approach for Sharing Protected Public Health Data

被引:5
|
作者
Erdman, Elizabeth A. [1 ]
Young, Leonard D. [2 ]
Bernson, Dana L. [1 ]
Bauer, Cici [4 ]
Chui, Kenneth [3 ]
Stopka, Thomas J. [5 ,6 ]
机构
[1] Commonwealth Massachusetts, Off Populat Hlth, Dept Publ Hlth, Boston, MA USA
[2] Commonwealth Massachusetts, Bur Hlth Profess Licensure, Dept Publ Hlth, Boston, MA USA
[3] Tufts Univ, Dept Publ Hlth & Community Med, Boston, MA USA
[4] Univ Texas Hlth Sci Ctr Houston, Dept Biostat & Data Sci, Houston, TX USA
[5] Tufts Univ, Tufts Clin & Translat Sci Inst, Medford, MA USA
[6] Tufts Univ, Dept Publ Hlth & Community Med, Medford, MA USA
关键词
MISSING DATA;
D O I
10.2105/AJPH.2021.306432
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives. To develop an imputation method to produce estimates for suppressed values within a shared government administrative data set to facilitate accurate data sharing and statistical and spatial analyses. Methods. We developed an imputation approach that incorporated known features of suppressed Massachusetts surveillance data from 2011 to 2017 to predict missing values more precisely. Our methods for 35 de-identified opioid prescription data sets combined modified previous or next substitution followed by mean imputation and a count adjustment to estimate suppressed values before sharing. We modeled 4 methods and compared the results to baseline mean imputation. Results. We assessed performance by comparing root mean squared error (RMSE), mean absolute error (MAE), and proportional variance between imputed and suppressed values. Our method outperformed mean imputation; we retained 46% of the suppressed value's proportional variance with better precision (22% lower RMSE and 26% lower MAE) than simple mean imputation. Conclusions. Our easy-to-implement imputation technique largely overcomes the adverse effects of low count value suppression with superior results to simple mean imputation. This novel method is generalizable to researchers sharing protected public health surveillance data.
引用
收藏
页码:1830 / 1838
页数:9
相关论文
共 50 条
  • [41] Imputation methods for estimating public R&D funding: evidence from longitudinal data
    Zinilli A.
    Quality & Quantity, 2021, 55 (2) : 707 - 729
  • [42] Imputation and Missing Indicators for Handling Missing Longitudinal Data: Data Simulation Analysis Based on Electronic Health Record Data
    Ehrig, Molly
    Bullock, Garrett S.
    Leng, Xiaoyan Iris
    Pajewski, Nicholas M.
    Speiser, Jaime Lynn
    JMIR MEDICAL INFORMATICS, 2025, 13
  • [43] A Novel Spatial-Temporal Regularized Tensor Completion Algorithm for Traffic Data Imputation
    Lin, Kaitong
    Zheng, Haifeng
    Feng, Xinxin
    Chen, Zhonghui
    2018 10TH INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS AND SIGNAL PROCESSING (WCSP), 2018,
  • [44] EvoImputer: An evolutionary approach for Missing Data Imputation and feature selection in the context of supervised learning
    Awawdeh, Shatha
    Faris, Hossam
    Hiary, Hazem
    KNOWLEDGE-BASED SYSTEMS, 2022, 236
  • [45] A general approach for imputation of non-normal continuous data based on copula transformation
    Lun, Zhixin
    Khattree, Ravindra
    COMMUNICATIONS IN STATISTICS-SIMULATION AND COMPUTATION, 2024, 53 (01) : 567 - 594
  • [46] A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
    Kumar, Nishith
    Hoque, Md Aminul
    Shahjaman, Md
    Islam, S. M. Shahinul
    Mollah, Md Nurul Haque
    CURRENT BIOINFORMATICS, 2019, 14 (01) : 43 - 52
  • [47] Data imputation in a short-run space-time series: A Bayesian approach
    Pforte, Lars
    Brunsdon, Chris
    Cahalane, Conor
    Charlton, Martin
    ENVIRONMENT AND PLANNING B-URBAN ANALYTICS AND CITY SCIENCE, 2018, 45 (05) : 864 - 887
  • [48] A novel model to optimize multiple imputation algorithm for missing data using evolution methods
    Mohammed, Yasser Salaheldin
    Abdelkader, Hatem
    Plawiak, Pawel
    Hammad, Mohamed
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2022, 76
  • [49] Missing Data Imputation for Multivariate Time series in Industrial IoT: A Federated Learning Approach
    Gkillas, Alexandros
    Lalos, Aris S.
    2022 IEEE 20TH INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS (INDIN), 2022, : 87 - 94
  • [50] PPCA-Based Missing Data Imputation for Traffic Flow Volume: A Systematical Approach
    Qu, Li
    Li, Li
    Zhang, Yi
    Hu, Jianming
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2009, 10 (03) : 512 - 522