Spot the difference: comparing results of analyses from real patient data and synthetic derivatives

被引:39
作者
Foraker, Randi E. [1 ,2 ]
Yu, Sean C. [2 ]
Gupta, Aditi [2 ]
Michelson, Andrew P. [3 ]
Soto, Jose A. Pineda [4 ]
Colvin, Ryan [2 ,4 ]
Loh, Francis [5 ]
Kollef, Marin H. [3 ]
Maddox, Thomas [6 ]
Evanoff, Bradley [1 ]
Dror, Hovav [7 ]
Zamstein, Noa [7 ]
Lai, Albert M. [1 ,2 ]
Payne, Philip R. O. [1 ,2 ]
机构
[1] Washington Univ, Div Gen Med Sci, Dept Med, St Louis, MO 63110 USA
[2] Washington Univ, Inst Informat, Dept Med, Sch Med, St Louis, MO 63110 USA
[3] Washington Univ, Div Pulm & Crit Care Med, Dept Med, Sch Med, St Louis, MO 63110 USA
[4] Childrens Hosp Los Angeles, Div Crit Care Med, Dept Anesthesiol & Crit Care Med, Los Angeles, CA 90027 USA
[5] Washington Univ, Sch Med, St Louis, MO 63110 USA
[6] Washington Univ, Sch Med, Healthcare Innovat Lab, BJC Healthcare, St Louis, MO 63110 USA
[7] MDClone Ltd, Beer Sheva, Israel
关键词
synthetic data; protected health information; precision health care; electronic health records and systems; data analysis; SEPSIS;
D O I
10.1093/jamiaopen/ooaa060
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Synthetic data may provide a solution to researchers who wish to generate and share data in support of precision healthcare. Recent advances in data synthesis enable the creation and analysis of synthetic derivatives as if they were the original data; this process has significant advantages over data deidentification. Objectives: To assess a big-data platform with data-synthesizing capabilities (MDClone Ltd., Beer Sheva, Israel) for its ability to produce data that can be used for research purposes while obviating privacy and confidentiality concerns. Methods: We explored three use cases and tested the robustness of synthetic data by comparing the results of analyses using synthetic derivatives to analyses using the original data using traditional statistics, machine learning approaches, and spatial representations of the data. We designed these use cases with the purpose of conducting analyses at the observation level (Use Case 1), patient cohorts (Use Case 2), and population-level data (Use Case 3). Results: For each use case, the results of the analyses were sufficiently statistically similar (P> 0.05) between the synthetic derivative and the real data to draw the same conclusions. Discussion and conclusion: This article presents the results of each use case and outlines key considerations for the use of synthetic data, examining their role in clinical research for faster insights and improved data sharing in support of precision healthcare.
引用
收藏
页码:557 / 566
页数:10
相关论文
共 18 条
  • [1] Synthesizing electronic health records using improved generative adversarial networks
    Baowaly, Mrinal Kanti
    Lin, Chia-Ching
    Liu, Chao-Lin
    Chen, Kuan-Ta
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (03) : 228 - 241
  • [2] Analyzing Medical Research Results Based on Synthetic Data and Their Relation to Real Data Results: Systematic Comparison From Five Observational Studies
    Benaim, Anat Reiner
    Almog, Ronit
    Gorelik, Yuri
    Hochberg, Irit
    Nassar, Laila
    Mashiach, Tanya
    Khamaisi, Mogher
    Lurie, Yael
    Azzam, Zaher S.
    Khoury, Johad
    Kurnik, Daniel
    Beyar, Rafael
    [J]. JMIR MEDICAL INFORMATICS, 2020, 8 (02)
  • [3] Data-driven approach for creating synthetic electronic medical records
    Buczak, Anna L.
    Babin, Steven
    Moniz, Linda
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2010, 10
  • [4] The validity of synthetic clinical data: a validation study of a leading synthetic data generator (Synthea) using clinical quality measures
    Chen, Junqiao
    Chun, David
    Patel, Milesh
    Chiang, Epson
    James, Jesse
    [J]. BMC MEDICAL INFORMATICS AND DECISION MAKING, 2019, 19 (1)
  • [5] Crawford R., 2007, Proceedings of theWorkshop on New Security Paradigms (NSPW), P41, DOI DOI 10.1145/1278940.1278948
  • [6] Erez L., 2017, Patent US, Patent No. 09965650
  • [7] Are Synthetic Data Derivatives the Future of Translational Medicine?
    Foraker, Randi
    Mann, Douglas L.
    Payne, Philip R. O.
    [J]. JACC-BASIC TO TRANSLATIONAL SCIENCE, 2018, 3 (05): : 716 - 718
  • [8] Goncalves A, GENERATION EVALUATIO
  • [9] 2001 SCCM/ESICM/ACCP/ATS/SIS International Sepsis Definitions Conference
    Levy, MM
    Fink, MP
    Marshall, JC
    Abraham, E
    Angus, D
    Cook, D
    Cohen, J
    Opal, SM
    Vincent, JL
    Ramsay, G
    [J]. CRITICAL CARE MEDICINE, 2003, 31 (04) : 1250 - 1256
  • [10] Hospital Deaths in Patients With Sepsis From 2 Independent Cohorts
    Liu, Vincent
    Escobar, Gabriel J.
    Greene, John D.
    Soule, Jay
    Whippy, Alan
    Angus, Derek C.
    Iwashyna, Theodore J.
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2014, 312 (01): : 90 - 92