Evaluating the Utility and Privacy of Synthetic Breast Cancer Clinical Trial Data Sets

被引:7
作者
El Kababji, Samer [1 ]
Mitsakakis, Nicholas [1 ]
Fang, Xi [2 ]
Beltran-Bless, Ana-Alicia [3 ,4 ]
Pond, Greg [5 ]
Vandermeer, Lisa [3 ]
Radhakrishnan, Dhenuka [1 ]
Mosquera, Lucy [1 ,2 ]
Paterson, Alexander [6 ]
Shepherd, Lois [7 ]
Chen, Bingshu [7 ]
Barlow, William E. [8 ]
Gralow, Julie [9 ]
Savard, Marie-France [3 ,4 ]
Clemons, Mark [3 ,4 ]
El Emam, Khaled [1 ,2 ,10 ]
机构
[1] CHEO Res Inst, Ottawa, ON, Canada
[2] Repl Analyt Ltd, Ottawa, ON, Canada
[3] Ottawa Hosp Res Inst, Ottawa, ON, Canada
[4] Univ Ottawa, Dept Med, Div Med Oncol, Ottawa, ON, Canada
[5] McMaster Univ, Hamilton, ON, Canada
[6] Alberta Hlth Serv, Edmonton, AB, Canada
[7] Queens Univ, Kingston, ON, Canada
[8] Canc Res & Biostat, Seattle, WA USA
[9] Univ Washington, Seattle, WA USA
[10] Univ Ottawa, Sch Epidemiol & Publ Hlth, Ottawa, ON, Canada
来源
JCO CLINICAL CANCER INFORMATICS | 2023年 / 7卷
基金
加拿大健康研究院; 加拿大自然科学与工程研究理事会;
关键词
RANDOMIZED-TRIAL; DATA GENERATION; MULTICENTER; THERAPY; RISK; UK;
D O I
10.1200/CCI.23.00116
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
PURPOSE There is strong interest from patients, researchers, the pharmaceutical industry, medical journal editors, funders of research, and regulators in sharing clinical trial data for secondary analysis. However, data access remains a challenge because of concerns about patient privacy. It has been argued that synthetic data generation (SDG) is an effective way to address these privacy concerns. There is a dearth of evidence supporting this on oncology clinical trial data sets, and on the utility of privacy-preserving synthetic data. The objective of the proposed study is to validate the utility and privacy risks of synthetic clinical trial data sets across multiple SDG techniques. METHODS We synthesized data sets from eight breast cancer clinical trial data sets using three types of generative models: sequential synthesis, conditional generative adversarial network, and variational autoencoder. Synthetic data utility was evaluated by replicating the published analyses on the synthetic data and assessing concordance of effect estimates and CIs between real and synthetic data. Privacy was evaluated by measuring attribution disclosure risk and membership disclosure risk. RESULTS Utility was highest using the sequential synthesis method where all results were replicable and the CI overlap most similar or higher for seven of eight data sets. Both types of privacy risks were low across all three types of generative models. DISCUSSION Synthetic data using sequential synthesis methods can act as a proxy for real clinical trial data sets, and simultaneously have low privacy risks. This type of generative model can be one way to enable broader sharing of clinical trial data.
引用
收藏
页数:10
相关论文
共 133 条
[1]   Estimating the reproducibility of psychological science [J].
Aarts, Alexander A. ;
Anderson, Joanna E. ;
Anderson, Christopher J. ;
Attridge, Peter R. ;
Attwood, Angela ;
Axt, Jordan ;
Babel, Molly ;
Bahnik, Stepan ;
Baranski, Erica ;
Barnett-Cowan, Michael ;
Bartmess, Elizabeth ;
Beer, Jennifer ;
Bell, Raoul ;
Bentley, Heather ;
Beyan, Leah ;
Binion, Grace ;
Borsboom, Denny ;
Bosch, Annick ;
Bosco, Frank A. ;
Bowman, Sara D. ;
Brandt, Mark J. ;
Braswell, Erin ;
Brohmer, Hilmar ;
Brown, Benjamin T. ;
Brown, Kristina ;
Bruening, Jovita ;
Calhoun-Sauls, Ann ;
Callahan, Shannon P. ;
Chagnon, Elizabeth ;
Chandler, Jesse ;
Chartier, Christopher R. ;
Cheung, Felix ;
Christopherson, Cody D. ;
Cillessen, Linda ;
Clay, Russ ;
Cleary, Hayley ;
Cloud, Mark D. ;
Cohn, Michael ;
Cohoon, Johanna ;
Columbus, Simon ;
Cordes, Andreas ;
Costantini, Giulio ;
Alvarez, Leslie D. Cramblet ;
Cremata, Ed ;
Crusius, Jan ;
DeCoster, Jamie ;
DeGaetano, Michelle A. ;
Della Penna, Nicolas ;
den Bezemer, Bobby ;
Deserno, Marie K. .
SCIENCE, 2015, 349 (6251)
[2]  
[Anonymous], 2003, J. Off. Stat.
[3]  
[Anonymous], 2017, Protection of personal data in clinical documents-A model approach
[4]  
[Anonymous], 2017, De-identification and anonymization of individual patient data in clinical studies: A model approach
[5]  
[Anonymous], 2003, Surv. Methodol.
[6]  
[Anonymous], 2019, GUARDIAN
[7]  
[Anonymous], 2023, ISO/IEC 27559:2022
[8]  
[Anonymous], 2019, ARTIFICIAL INTELLIGENCE IS THE FUTURE OF GROWTH
[9]   Using 26,000 Diary Entries to Show Ovulatory Changes in Sexual Desire and Behavior (vol 121, pg 410, 2021) [J].
Arslan, Ruben C. ;
Schilling, Katharina M. ;
Gerlach, Tanja J. ;
Penke, Lars .
JOURNAL OF PERSONALITY AND SOCIAL PSYCHOLOGY, 2021, 121 (02) :1-1
[10]   Paper Feasibility outcomes of a randomised, multicentre, pilot trial comparing standard 6-monthly dosing of adjuvant zoledronate with a single one-time dose in patients with early stage breast cancer [J].
Awan, Arif ;
Ng, Terry ;
Conter, Henry ;
Raskin, William ;
Stober, Carol ;
Simos, Demetrios ;
Pond, Greg ;
Dhesy-Thind, Sukhbinder ;
Mates, Mihaela ;
Kumar, Vikaash ;
Fergusson, Dean ;
Hutton, Brian ;
Saunders, Deanna ;
Vandermeer, Lisa ;
Clemons, Mark .
JOURNAL OF BONE ONCOLOGY, 2021, 26