Synthetic data for privacy-preserving clinical risk prediction

被引:5
作者
Qian, Zhaozhi [1 ]
Callender, Thomas [2 ]
Cebere, Bogdan [1 ]
Janes, Sam M. [2 ]
Navani, Neal [2 ]
van der Schaar, Mihaela [1 ,3 ]
机构
[1] Univ Cambridge, Cambridge CB2 1TN, England
[2] UCL, London WC1E 6BT, England
[3] Alan Turing Inst, London NW1 2DB, England
基金
芬兰科学院;
关键词
Synthetic data; Machine learning; Risk-prediction; MIXTURES; UTILITY;
D O I
10.1038/s41598-024-72894-y
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Synthetic data promise privacy-preserving data sharing for healthcare research and development. Compared with other privacy-enhancing approaches-such as federated learning-analyses performed on synthetic data can be applied downstream without modification, such that synthetic data can act in place of real data for a wide range of use cases. However, the role that synthetic data might play in all aspects of clinical model development remains unknown. In this work, we used state-of-the-art generators explicitly designed for privacy preservation to create a synthetic version of ever-smokers in the UK Biobank before building prognostic models for lung cancer under several data release assumptions. We demonstrate that synthetic data can be effectively used throughout the medical prognostic modeling pipeline even without eventual access to the real data. Furthermore, we show the implications of different data release approaches on how synthetic biobank data could be deployed within the healthcare system.
引用
收藏
页数:14
相关论文
共 64 条
[1]  
Abowd JM, 2008, LECT NOTES COMPUT SC, V5262, P239
[2]  
Alaa AM, 2022, PR MACH LEARN RES, P290
[3]  
[Anonymous], 2006, Differential privacy
[4]  
Arjovsky M, 2017, PR MACH LEARN RES, V70
[5]  
Arnold C., 2020, arXiv
[6]  
Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[7]  
Assefa S.A., 2020, P 1 ACM INT C AI FIN, P1
[8]   Data Sharing For Precision Medicine: Policy Lessons And Future Directions [J].
Blasimme, Alessandro ;
Fadda, Marta ;
Schneider, Manuel ;
Vayena, Effy .
HEALTH AFFAIRS, 2018, 37 (05) :702-709
[9]   Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models [J].
Bond-Taylor, Sam ;
Leach, Adam ;
Long, Yang ;
Willcocks, Chris G. .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (11) :7327-7347
[10]  
Carlini N, 2019, PROCEEDINGS OF THE 28TH USENIX SECURITY SYMPOSIUM, P267