A Systematic Review of Synthetic Data Generation Techniques Using Generative AI

被引:43
作者
Goyal, Mandeep [1 ]
Mahmoud, Qusay H. [1 ]
机构
[1] Ontario Tech Univ, Dept Elect Comp & Software Engn, Oshawa, ON L1G 0C5, Canada
关键词
synthetic data; LLMs; GANs; VAEs; generative AI; neural networks; machine learning;
D O I
10.3390/electronics13173509
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic data are increasingly being recognized for their potential to address serious real-world challenges in various domains. They provide innovative solutions to combat the data scarcity, privacy concerns, and algorithmic biases commonly used in machine learning applications. Synthetic data preserve all underlying patterns and behaviors of the original dataset while altering the actual content. The methods proposed in the literature to generate synthetic data vary from large language models (LLMs), which are pre-trained on gigantic datasets, to generative adversarial networks (GANs) and variational autoencoders (VAEs). This study provides a systematic review of the various techniques proposed in the literature that can be used to generate synthetic data to identify their limitations and suggest potential future research areas. The findings indicate that while these technologies generate synthetic data of specific data types, they still have some drawbacks, such as computational requirements, training stability, and privacy-preserving measures which limit their real-world usability. Addressing these issues will facilitate the broader adoption of synthetic data generation techniques across various disciplines, thereby advancing machine learning and data-driven solutions.
引用
收藏
页数:38
相关论文
共 50 条
[31]   The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review [J].
Wang, Xi ;
Zhou, Yujia ;
Zhou, Guangyu .
JMIR MENTAL HEALTH, 2025, 12
[32]   Pedagogical Applications of Generative AI in Higher Education: A Systematic Review of the Field [J].
Qian, Yufeng .
TECHTRENDS, 2025,
[33]   Evaluation of Synthetic Data Generation Techniques in the Domain of Anonymous Traffic Classification [J].
Cullen, Drake ;
Halladay, James ;
Briner, Nathan ;
Basnet, Ram ;
Bergen, Jeremy ;
Doleck, Tenzin .
IEEE ACCESS, 2022, 10 :129612-129625
[34]   Unsupervised Hybrid Deep Generative Models for Photovoltaic Synthetic Data Generation [J].
de Jesus, Dan A. Rosa ;
Mandal, Paras ;
Senjyu, Tomonobu ;
Kamalasadan, Sukumar .
2021 IEEE POWER & ENERGY SOCIETY GENERAL MEETING (PESGM), 2021,
[35]   Synthetic Behavior Sequence Generation Using Generative Adversarial Networks [J].
Akbari, Fateme ;
Sartipi, Kamran ;
Archer, Norm .
ACM TRANSACTIONS ON COMPUTING FOR HEALTHCARE, 2023, 4 (01)
[36]   A Review of Generative Models in Generating Synthetic Attack Data for Cybersecurity [J].
Agrawal, Garima ;
Kaur, Amardeep ;
Myneni, Sowmya .
ELECTRONICS, 2024, 13 (02)
[37]   Curtaining artifacts generation on synthetic FIB-SEM data via Generative Adversarial Networks [J].
Roldan, Diego ;
Barbosa-Torres, Luis .
OPTICS COMMUNICATIONS, 2025, 574
[38]   Novel concept-oriented synthetic data approach for training generative AI-Driven crystal grain analysis using diffusion model [J].
Saleh, A. S. ;
Croes, K. ;
Ceric, H. ;
De Wolf, I. ;
Zahedmanesh, H. .
COMPUTATIONAL MATERIALS SCIENCE, 2025, 251
[39]   Accelerated alloy discovery using synthetic data generation and data mining [J].
Kannan, Rangasayee ;
Nandwana, Peeyush .
SCRIPTA MATERIALIA, 2023, 228
[40]   Deep Generative Models for Synthetic Data: A Survey [J].
Eigenschink, Peter ;
Reutterer, Thomas ;
Vamosi, Stefan ;
Vamosi, Ralf ;
Sun, Chang ;
Kalcher, Klaudius .
IEEE ACCESS, 2023, 11 :47304-47320