Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

被引:12
作者
Eckardt, Jan-Niklas [1 ,2 ]
Hahn, Waldemar [3 ,4 ]
Roellig, Christoph [1 ]
Stasik, Sebastian [1 ]
Platzbecker, Uwe [5 ]
Mueller-Tidow, Carsten [6 ]
Serve, Hubert [7 ]
Baldus, Claudia D. [8 ]
Schliemann, Christoph [9 ]
Schaefer-Eckart, Kerstin [10 ,11 ]
Hanoun, Maher [12 ]
Kaufmann, Martin [13 ]
Burchert, Andreas [14 ]
Thiede, Christian [1 ]
Schetelig, Johannes [1 ]
Sedlmayr, Martin [4 ]
Bornhaeuser, Martin [1 ,15 ,16 ]
Wolfien, Markus [3 ,4 ]
Middeke, Jan Moritz [1 ,2 ]
机构
[1] Tech Univ Dresden, Univ Hosp Carl Gustav Carus, Dept Internal Med 1, Dresden, Germany
[2] Tech Univ Dresden, Else Kroner Fresenius Ctr Digital Hlth, Dresden, Germany
[3] Ctr Scalable Data Analyt & Artificial Intelligence, Leipzig, Germany
[4] Tech Univ Dresden, Inst Med Informat & Biometry, Dresden, Germany
[5] Univ Hosp, Med Clin & Policlin Hematol & Cell Therapy 1, Leipzig, Germany
[6] Univ Hosp Heidelberg, Dept Med 5, Heidelberg, Germany
[7] Goethe Univ Frankfurt, Dept Med Hematol & Oncol 2, Frankfurt, Germany
[8] Univ Hosp Schleswig Holstein, Dept Hematol & Oncol, Kiel, Germany
[9] Univ Hosp Munster, Dept Med A, Munster, Germany
[10] Paracelsus Med Privatuniv, Dept Internal Med 5, Nurnberg, Germany
[11] Univ Hosp Nurnberg, Nurnberg, Germany
[12] Univ Hosp Essen, Dept Hematol, Essen, Germany
[13] Robert Bosch Krankenhaus, Dept Hematol Oncol & Palliat Care, Stuttgart, Germany
[14] Philipps Univ Marburg, Dept Hematol Oncol & Immunol, Marburg, Germany
[15] German Consortium Translat Canc Res DKTK, Heidelberg, Germany
[16] Natl Ctr Tumor Dis NCT, Dresden, Germany
关键词
HEALTH-CARE; MITOXANTRONE; MUTATIONS; PRIVACY;
D O I
10.1038/s41746-024-01076-x
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence - CTAB-GAN+ and normalizing flows (NFlow) - to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.
引用
收藏
页数:11
相关论文
共 56 条
  • [21] Synthetic data generation for tabular health records: A review
    Hernandez, Mikel
    Epelde, Gorka
    Alberdi, Ane
    Cilla, Rodrigo
    Rankin, Debbie
    [J]. NEUROCOMPUTING, 2022, 493 : 28 - 45
  • [22] Opportunities and Challenges of Synthetic Data Generation in Oncology
    Jacobs, Flavia
    D'Amico, Saverio
    Benvenuti, Chiara
    Gaudio, Mariangela
    Saltalamacchia, Giuseppe
    Miggiano, Chiara
    De Sanctis, Rita
    Della Porta, Matteo Giovanni
    Santoro, Armando
    Zambelli, Alberto
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [23] Katz S, 2021, HEALTH SERV RES, V56, P26
  • [24] GANs for medical image analysis
    Kazeminia, Salome
    Baur, Christoph
    Kuijper, Arjan
    van Ginneken, Bram
    Navab, Nassir
    Albarqouni, Shadi
    Mukhopadhyay, Anirban
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 109
  • [25] Real-world Data for Clinical Evidence Generation in Oncology
    Khozin, Sean
    Blumenthal, Gideon M.
    Pazdur, Richard
    [J]. JNCI-JOURNAL OF THE NATIONAL CANCER INSTITUTE, 2017, 109 (11):
  • [26] Community-Wide Health Risk Assessment Using Geographically Resolved Demographic Data: A Synthetic Population Approach
    Levy, Jonathan I.
    Fabian, Maria Patricia
    Peters, Junenette L.
    [J]. PLOS ONE, 2014, 9 (01):
  • [27] MARIMONT RB, 1979, J I MATH APPL, V24, P59
  • [28] How much do clinical trials cost?
    Martin, Linda
    Hutchens, Melissa
    Hawkins, Conrad
    Radnov, Alaina
    [J]. NATURE REVIEWS DRUG DISCOVERY, 2017, 16 (06) : 381 - 382
  • [29] Melis L, 2018, Arxiv, DOI [arXiv:1805.04049, 10.48550/arXiv.1805.04049, DOI 10.48550/ARXIV.1805.04049]
  • [30] Synthetic data generation: State of the art in health care domain
    Murtaza, Hajra
    Ahmed, Musharif
    Khan, Naurin Farooq
    Murtaza, Ghulam
    Zafar, Saad
    Bano, Ambreen
    [J]. COMPUTER SCIENCE REVIEW, 2023, 48