A Systematic Review of Synthetic Data Generation Techniques Using Generative AI

被引:39
作者
Goyal, Mandeep [1 ]
Mahmoud, Qusay H. [1 ]
机构
[1] Ontario Tech Univ, Dept Elect Comp & Software Engn, Oshawa, ON L1G 0C5, Canada
关键词
synthetic data; LLMs; GANs; VAEs; generative AI; neural networks; machine learning;
D O I
10.3390/electronics13173509
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Synthetic data are increasingly being recognized for their potential to address serious real-world challenges in various domains. They provide innovative solutions to combat the data scarcity, privacy concerns, and algorithmic biases commonly used in machine learning applications. Synthetic data preserve all underlying patterns and behaviors of the original dataset while altering the actual content. The methods proposed in the literature to generate synthetic data vary from large language models (LLMs), which are pre-trained on gigantic datasets, to generative adversarial networks (GANs) and variational autoencoders (VAEs). This study provides a systematic review of the various techniques proposed in the literature that can be used to generate synthetic data to identify their limitations and suggest potential future research areas. The findings indicate that while these technologies generate synthetic data of specific data types, they still have some drawbacks, such as computational requirements, training stability, and privacy-preserving measures which limit their real-world usability. Addressing these issues will facilitate the broader adoption of synthetic data generation techniques across various disciplines, thereby advancing machine learning and data-driven solutions.
引用
收藏
页数:38
相关论文
共 50 条
[21]   A systematic review of big data in energy analytics using energy computing techniques [J].
Dhanalakshmi, J. ;
Ayyanathan, N. .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (04)
[22]   Transforming marketing landscapes: a systematic literature review of generative AI using the TCCM model framework [J].
Prasanna, Akshara ;
Kushwaha, Bijay Prasad .
MANAGEMENT REVIEW QUARTERLY, 2025,
[23]   What Do We Mean by GenAI? A Systematic Mapping of The Evolution, Trends, and Techniques Involved in Generative AI [J].
Garcia-Penalvo, Francisco Jose ;
Vazquez-Ingelmo, Andrea .
INTERNATIONAL JOURNAL OF INTERACTIVE MULTIMEDIA AND ARTIFICIAL INTELLIGENCE, 2023, 8 (04) :7-16
[24]   Demonstration of Automation of Network Configuration Generation using Generative AI [J].
Chakraborty, Supratim ;
Chitta, Nithin ;
Sundaresan, Rajesh .
2024 20TH INTERNATIONAL CONFERENCE ON NETWORK AND SERVICE MANAGEMENT, CNSM 2024, 2024,
[25]   Synthetic data generation with deep generative models to enhance predictive tasks in trading strategies [J].
Carvajal-Patino, Daniel ;
Ramos-Pollan, Raul .
RESEARCH IN INTERNATIONAL BUSINESS AND FINANCE, 2022, 62
[26]   Trials of using Generative AI for APB UVM testbench generation [J].
Dranga, Diana .
ROMANIAN JOURNAL OF INFORMATION TECHNOLOGY AND AUTOMATIC CONTROL-REVISTA ROMANA DE INFORMATICA SI AUTOMATICA, 2024, 34 (02)
[27]   The Role of Generative AI Models in Requirements Engineering: A Systematic Literature Review [J].
Vasudevan, Poonkuzhali ;
Reddivari, Sandeep .
PROCEEDINGS OF THE 2025 ACM SOUTHEAST CONFERENCE, ACMSE 2025, 2025, :188-194
[28]   Synthetic data generation using generative adversarial network for tokamak plasma current quench experiments [J].
Dave, Bhrugu ;
Patel, Sarthak ;
Shivani, Rishi ;
Purohit, Shishir ;
Chaudhury, Bhaskar .
CONTRIBUTIONS TO PLASMA PHYSICS, 2023, 63 (5-6)
[29]   Integrating Generative AI into Information Systems Research: A Framework for Synthetic Data Evaluation [J].
Rossello, Nicolas Bono ;
Simonofski, Anthony ;
Rossello, Lluc Bono ;
Castiaux, Annick .
PROCEEDINGS OF THE 58TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES, 2025, :7195-7204
[30]   A systematic review of factors, data sources, and prediction techniques for earlier prediction of traffic collision using AI and machine Learning [J].
Niture N. ;
Abdellatif I. .
Multimedia Tools and Applications, 2025, 84 (18) :19009-19037