Generation of Synthetic Tabular Healthcare Data Using Generative Adversarial Networks

被引:4
|
作者
Nik, Alireza Hossein Zadeh [1 ,2 ]
Riegler, Michael A. [1 ,3 ]
Halvorsen, Pal [1 ,4 ]
Storas, Andrea M. [1 ,4 ]
机构
[1] SimulaMet, Oslo, Norway
[2] Univ Stavanger, Stavanger, Norway
[3] Univ Tromso, Tromso, Norway
[4] OsloMet, Oslo, Norway
来源
关键词
Synthetic data generation; Deep learning; Medical data;
D O I
10.1007/978-3-031-27077-2_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High-quality tabular data is a crucial requirement for developing data-driven applications, especially healthcare-related ones, because most of the data nowadays collected in this context is in tabular form. However, strict data protection laws complicates the access to medical datasets. Thus, synthetic data has become an ideal alternative for data scientists and healthcare professionals to circumvent such hurdles. Although many healthcare institutions still use the classical de-identification and anonymization techniques for generating synthetic data, deep learning-based generative models such as generative adversarial networks (GANs) have shown a remarkable performance in generating tabular datasets with complex structures. This paper examines the GANs' potential and applicability within the healthcare industry, which often faces serious challenges with insufficient training data and patient records sensitivity. We investigate several state-of-the-art GAN-based models proposed for tabular synthetic data generation. Healthcare datasets with different sizes, numbers of variables, column data types, feature distributions, and inter-variable correlations are examined. Moreover, a comprehensive evaluation framework is defined to evaluate the quality of the synthetic records and the viability of each model in preserving the patients' privacy. The results indicate that the proposed models can generate synthetic datasets that maintain the statistical characteristics, model compatibility and privacy of the original data. Moreover, synthetic tabular healthcare datasets can be a viable option in many data-driven applications. However, there is still room for further improvements in designing a perfect architecture for generating synthetic tabular data.
引用
收藏
页码:434 / 446
页数:13
相关论文
共 50 条
  • [31] Generative Adversarial Networks applied to synthetic financial scenarios generation
    Rizzato, Matteo
    Wallart, Julien
    Geissler, Christophe
    Morizet, Nicolas
    Boumlaik, Noureddine
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2023, 623
  • [32] Synthetic Dataset Generation for Text Recognition with Generative Adversarial Networks
    Efimova, Valeria
    Shalamov, Viacheslav
    Filchenkov, Andrey
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [33] Synthetic Intrusion Alert Generation through Generative Adversarial Networks
    Sweet, Christopher
    Moskal, Stephen
    Yang, Shanchieh Jay
    MILCOM 2019 - 2019 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2019,
  • [34] SynSigGAN: Generative Adversarial Networks for Synthetic Biomedical Signal Generation
    Hazra, Debapriya
    Byun, Yung-Cheol
    BIOLOGY-BASEL, 2020, 9 (12): : 1 - 20
  • [35] Contactless Blood Pressure Measurement Via Remote Photoplethysmography With Synthetic Data Generation Using Generative Adversarial Networks
    Wu, Bing-Fei
    Chiu, Li-Wen
    Wu, Yi-Chiao
    Lai, Chun-Chih
    Cheng, Hao-Min
    Chu, Pao-Hsien
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (02) : 621 - 632
  • [36] Generation of synthetic ground glass opacities (GGOs) using generative adversarial networks (GANs)
    Wang, Z.
    Zhang, Z.
    Hendriks, L. E. L.
    Miclea, R.
    Gietema, H.
    Schoenmaekers, J.
    Wee, L.
    Dekker, A.
    Traverso, A.
    ANNALS OF ONCOLOGY, 2022, 33 : S80 - S80
  • [37] Empirical Evaluation on Synthetic Data Generation with Generative Adversarial Network
    Lu, Pei-Hsuan
    Wang, Pang-Chieh
    Yu, Chia-Mu
    PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE, MINING AND SEMANTICS (WIMS 2019), 2019,
  • [38] Curtaining artifacts generation on synthetic FIB-SEM data via Generative Adversarial Networks
    Roldan, Diego
    Barbosa-Torres, Luis
    OPTICS COMMUNICATIONS, 2025, 574
  • [39] Generative Adversarial Networks for Synthetic Data Generation in Finance: Evaluating Statistical Similarities and Quality Assessment
    Ramzan, Faisal
    Sartori, Claudio
    Consoli, Sergio
    Recupero, Diego Reforgiato
    AI, 2024, 5 (02) : 667 - 685
  • [40] Enhancing Histopathological Image Classification Performance through Synthetic Data Generation with Generative Adversarial Networks
    Ruiz-Casado, Jose L.
    Molina-Cabello, Miguel A.
    Luque-Baena, Rafael M.
    SENSORS, 2024, 24 (12)