Mimicking clinical trials with synthetic acute myeloid leukemia patients using generative artificial intelligence

被引:12
作者
Eckardt, Jan-Niklas [1 ,2 ]
Hahn, Waldemar [3 ,4 ]
Roellig, Christoph [1 ]
Stasik, Sebastian [1 ]
Platzbecker, Uwe [5 ]
Mueller-Tidow, Carsten [6 ]
Serve, Hubert [7 ]
Baldus, Claudia D. [8 ]
Schliemann, Christoph [9 ]
Schaefer-Eckart, Kerstin [10 ,11 ]
Hanoun, Maher [12 ]
Kaufmann, Martin [13 ]
Burchert, Andreas [14 ]
Thiede, Christian [1 ]
Schetelig, Johannes [1 ]
Sedlmayr, Martin [4 ]
Bornhaeuser, Martin [1 ,15 ,16 ]
Wolfien, Markus [3 ,4 ]
Middeke, Jan Moritz [1 ,2 ]
机构
[1] Tech Univ Dresden, Univ Hosp Carl Gustav Carus, Dept Internal Med 1, Dresden, Germany
[2] Tech Univ Dresden, Else Kroner Fresenius Ctr Digital Hlth, Dresden, Germany
[3] Ctr Scalable Data Analyt & Artificial Intelligence, Leipzig, Germany
[4] Tech Univ Dresden, Inst Med Informat & Biometry, Dresden, Germany
[5] Univ Hosp, Med Clin & Policlin Hematol & Cell Therapy 1, Leipzig, Germany
[6] Univ Hosp Heidelberg, Dept Med 5, Heidelberg, Germany
[7] Goethe Univ Frankfurt, Dept Med Hematol & Oncol 2, Frankfurt, Germany
[8] Univ Hosp Schleswig Holstein, Dept Hematol & Oncol, Kiel, Germany
[9] Univ Hosp Munster, Dept Med A, Munster, Germany
[10] Paracelsus Med Privatuniv, Dept Internal Med 5, Nurnberg, Germany
[11] Univ Hosp Nurnberg, Nurnberg, Germany
[12] Univ Hosp Essen, Dept Hematol, Essen, Germany
[13] Robert Bosch Krankenhaus, Dept Hematol Oncol & Palliat Care, Stuttgart, Germany
[14] Philipps Univ Marburg, Dept Hematol Oncol & Immunol, Marburg, Germany
[15] German Consortium Translat Canc Res DKTK, Heidelberg, Germany
[16] Natl Ctr Tumor Dis NCT, Dresden, Germany
关键词
HEALTH-CARE; MITOXANTRONE; MUTATIONS; PRIVACY;
D O I
10.1038/s41746-024-01076-x
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Clinical research relies on high-quality patient data, however, obtaining big data sets is costly and access to existing data is often hindered by privacy and regulatory concerns. Synthetic data generation holds the promise of effectively bypassing these boundaries allowing for simplified data accessibility and the prospect of synthetic control cohorts. We employed two different methodologies of generative artificial intelligence - CTAB-GAN+ and normalizing flows (NFlow) - to synthesize patient data derived from 1606 patients with acute myeloid leukemia, a heterogeneous hematological malignancy, that were treated within four multicenter clinical trials. Both generative models accurately captured distributions of demographic, laboratory, molecular and cytogenetic variables, as well as patient outcomes yielding high performance scores regarding fidelity and usability of both synthetic cohorts (n = 1606 each). Survival analysis demonstrated close resemblance of survival curves between original and synthetic cohorts. Inter-variable relationships were preserved in univariable outcome analysis enabling explorative analysis in our synthetic data. Additionally, training sample privacy is safeguarded mitigating possible patient re-identification, which we quantified using Hamming distances. We provide not only a proof-of-concept for synthetic data generation in multimodal clinical data for rare diseases, but also full public access to synthetic data sets to foster further research.
引用
收藏
页数:11
相关论文
共 56 条
  • [1] Synthetic patient data in health care: a widening legal loophole
    Arora, Anmol
    Arora, Ananya
    [J]. LANCET, 2022, 399 (10335) : 1601 - 1602
  • [2] Can synthetic data be a proxy for real clinical trial data? A validation study
    Azizi, Zahra
    Zheng, Chaoyi
    Mosquera, Lucy
    Pilote, Louise
    El Emam, Khaled
    [J]. BMJ OPEN, 2021, 11 (04):
  • [3] The Problem of Fairness in Synthetic Healthcare Data
    Bhanot, Karan
    Qi, Miao
    Erickson, John S.
    Guyon, Isabelle
    Bennett, Kristin P.
    [J]. ENTROPY, 2021, 23 (09)
  • [4] Boenisch F, 2023, Arxiv, DOI arXiv:2112.02918
  • [5] cancer.gov, 2018, The Cancer Genome Atlas Program - National Cancer Institute
  • [6] Synthetic data in machine learning for medicine and healthcare
    Chen, Richard J.
    Lu, Ming Y.
    Chen, Tiffany Y.
    Williamson, Drew F. K.
    Mahmood, Faisal
    [J]. NATURE BIOMEDICAL ENGINEERING, 2021, 5 (06) : 493 - 497
  • [7] Chundawat VS, 2024, Arxiv, DOI arXiv:2207.05295
  • [8] Synthetic Data Generation by Artificial Intelligence to Accelerate Research and Precision Medicine in Hematology
    D'Amico, Saverio
    Dall'Olio, Daniele
    Sala, Claudia
    Dall'Olio, Lorenzo
    Sauta, Elisabetta
    Zampini, Matteo
    Asti, Gianluca
    Lanino, Luca
    Maggioni, Giulia
    Campagna, Alessia
    Ubezio, Marta
    Russo, Antonio
    Bicchieri, Maria Elena
    Riva, Elena
    Tentori, Cristina A.
    Travaglino, Erica
    Morandini, Pierandrea
    Savevski, Victor
    Santoro, Armando
    Prada-Luengo, Inigo
    Krogh, Anders
    Santini, Valeria
    Kordasti, Shahram
    Platzbecker, Uwe
    Diez-Campelo, Maria
    Fenaux, Pierre
    Haferlach, Torsten
    Castellani, Gastone
    Della Porta, Matteo Giovanni
    [J]. JCO CLINICAL CANCER INFORMATICS, 2023, 7
  • [9] Fake It Till You Make It: Guidelines for Effective Synthetic Data Generation
    Dankar, Fida K.
    Ibrahim, Mahmoud
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (05): : 1 - 18
  • [10] Diagnosis and management of AML in adults: 2022 recommendations from an international expert panel on behalf of the ELN
    Doehner, Hartmut
    Wei, Andrew H.
    Appelbaum, Frederick R.
    Craddock, Charles
    DiNardo, Courtney D.
    Dombret, Herve
    Ebert, Benjamin L.
    Fenaux, Pierre
    Godley, Lucy A.
    Hasserjian, Robert P.
    Larson, Richard A.
    Levine, Ross L.
    Miyazaki, Yasushi
    Niederwieser, Dietger
    Ossenkoppele, Gert
    Roellig, Christoph
    Sierra, Jorge
    Stein, Eytan M.
    Tallman, Martin S.
    Tien, Hwei-Fang
    Wang, Jianxiang
    Wierzbowska, Agnieszka
    Lowenberg, Bob
    [J]. BLOOD, 2022, 140 (12) : 1345 - 1377