Synthetic Data Digital Twins and Data Trusts Control for Privacy in Health Data Sharing

被引:0
|
作者
Lomotey, Richard K. [1 ]
Kumi, Sandra [2 ]
Ray, Madhurima [3 ]
Deters, Ralph [2 ]
机构
[1] Penn State Univ, Informat Sci & Tech, Monaca, PA 15061 USA
[2] Univ Saskatchewan, Dept Comp Sci, Saskatoon, SK, Canada
[3] Penn State Univ, Dept Comp Sci, Monaca, PA USA
来源
PROCEEDINGS OF THE 2024 ACM WORKSHOP ON SECURE AND TRUSTWORTHY CYBER-PHYSICAL SYSTEMS, SAT-CPS 2024 | 2024年
关键词
Synthetic Health Data; Digital Twins; Data Trusts; Machine Learning; Artificial Intelligence; Privacy; Middleware; FRAMEWORK;
D O I
10.1145/3643650.3658605
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Health data sharing is very valuable for medical research since it has the propensity to improve diagnostics, policy, medication, and so on. At the same time, sharing health data needs to be done without compromising the privacy of patients and stakeholders. However, recent advances in AI/ML and sophisticated analytics have proven to introduce biases that can easily identify patients based on their healthcare data, which violates privacy. In this work, we sort to address this major issue by exploring two emerging topics that are gaining attention from industry, academia, and governments, i.e., digital twins and data trusts. First, we proposed the use of digital twins (DTs) to generate synthetic records of patient's heart rate data. DTs are virtual replicas of the actual data and were created using two synthetic data generative models - Gaussian Copula (GC) and Tabular Variational Autoencoder (TVAE). The GC and TVAE achieved a maximum data quality score of 88% and 96% respectively. Next, we posit that the DTs should be shared with a data trusts layer. Data trusts are fiduciary frameworks that govern multi-party data sharing. The data trusts enforce access controls (based on metrics such as location, role-based, and policy-based) to the synthetic health data and reports to the data subject. The preliminary evaluations of the work show that merging the two techniques (i.e., synthetic data digital twins and data trusts) enforces better privacy for health data access. The synthetic data ensures more anonymization while the data trusts provide easy auditing, tracking, and efficient reporting to the patient or data subject. The paper also detailed the architectural design of the data trusts and evaluated the efficiency of the access control techniques.
引用
收藏
页码:1 / 10
页数:10
相关论文
共 50 条
  • [21] On Measuring the Privacy of Anonymized Data in Multiparty Network Data Sharing
    Chen Xiaoyun
    Su Yujie
    Tang Xiaosheng
    Huang Xiaohong
    Ma Yan
    CHINA COMMUNICATIONS, 2013, 10 (05) : 120 - 127
  • [22] Summary Statistic Privacy in Data Sharing
    Lin Z.
    Wang S.
    Sekar V.
    Fanti G.
    IEEE Journal on Selected Areas in Information Theory, 2024, 5 : 369 - 384
  • [23] Sharing is CAIRing: Characterizing principles and assessing properties of universal privacy evaluation for synthetic tabular data
    Hyrup, Tobias
    Lautrup, Anton Danholt
    Zimek, Arthur
    Schneider-Kamp, Peter
    MACHINE LEARNING WITH APPLICATIONS, 2024, 18
  • [24] Data Sharing of Imaging in an Evolving Health Care World: Report of the ACR Data Sharing Workgroup, Part 1: Data Ethics of Privacy, Consent, and Anonymization
    Batlle, Juan Carlos
    Dreyer, Keith
    Allen, Bibb
    Cook, Tessa
    Roth, Christopher J.
    Kitts, Andrea Borondy
    Geis, Raym
    Wu, Carol C.
    Lungren, Matt P.
    Patti, Jay
    Prater, Adam
    Rubin, Daniel
    Halabi, Safwan
    Tilkin, Mike
    Hoffman, Tom
    Coombs, Laura
    Wald, Christoph
    JOURNAL OF THE AMERICAN COLLEGE OF RADIOLOGY, 2021, 18 (12) : 1646 - 1654
  • [25] Social network data analysis to highlight privacy threats in sharing data
    Francesca Cerruto
    Stefano Cirillo
    Domenico Desiato
    Simone Michele Gambardella
    Giuseppe Polese
    Journal of Big Data, 9
  • [26] Social network data analysis to highlight privacy threats in sharing data
    Cerruto, Francesca
    Cirillo, Stefano
    Desiato, Domenico
    Gambardella, Simone Michele
    Polese, Giuseppe
    JOURNAL OF BIG DATA, 2022, 9 (01)
  • [27] An anonymization-based privacy-preserving data collection protocol for digital health data
    Andrew, J.
    Eunice, R. Jennifer
    Karthikeyan, J.
    FRONTIERS IN PUBLIC HEALTH, 2023, 11
  • [28] Data Integration for Digital Twins in Industrial Automation: A Systematic Literature Review
    Hildebrandt, Gary
    Dittler, Daniel
    Habiger, Pascal
    Drath, Rainer
    Weyrich, Michael
    IEEE ACCESS, 2024, 12 : 139129 - 139153
  • [29] Generating synthetic data using GANs fusion in the digital twins model for sonars
    Polap, Dawid
    Jaszcz, Antoni
    Prokop, Katarzyna
    2024 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN 2024, 2024,
  • [30] Utility and Privacy Assessments of Synthetic Data for Regression Tasks
    Hinmeir, Markus
    Ekelhart, Andreas
    Mayer, Rudolf
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 5763 - 5772