GAN-Based Generation of Synthetic Data for Vehicle Driving Events

被引:6
作者
Tamayo-Urgiles, Diego [1 ]
Sanchez-Gordon, Sandra [1 ]
Caraguay, angel Leonardo Valdivieso [1 ]
Hernandez-alvarez, Myriam [1 ]
机构
[1] Escuela Politec Nacl, Dept Informat & Ciencias Comp, Edificio Sistemas, Quito 170525, Ecuador
来源
APPLIED SCIENCES-BASEL | 2024年 / 14卷 / 20期
关键词
synthetic data generation; generative adversarial networks; driving event data; time series synthesis; traffic accident risk level;
D O I
10.3390/app14209269
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Developing solutions to reduce traffic accidents requires experimentation and much data. However, due to confidentiality issues, not all datasets used in previous research are publicly available, and those that are available may be insufficient for research. Building datasets with real data is costly. Given this reality, this paper proposes a procedure to generate synthetic data sequences of driving events using the Time series GAN (TimeGAN) and Real-world time series (RTSGAN) frameworks. First, a 15-feature driving event dataset is constructed with real data, which forms the basis for generating datasets using the two mentioned frameworks. The generated datasets are evaluated using the qualitative metrics PCA and T-SNE, as well as the discriminative and predictive score quantitative metrics defined in TimeGAN. The generated synthetic data are then used in an unsupervised algorithm to identify clusters representing vehicle crash risk levels. Next, the generated data are used in a supervised classification algorithm to predict risk level categories. Comparison results between the data generated by TimeGAN and RTSGAN show that the data generated by RTSGAN achieve better scores than the the data generated with TimeGAN. On the other hand, we demonstrate that the use of datasets trained with synthetic data to train a supervised classification model for predicting the level of accident risk can obtain accuracy comparable to that of models that use datasets with only real data in their training, proving the usefulness of the generated data.
引用
收藏
页数:38
相关论文
共 39 条
[1]  
Alzantot M, 2017, Arxiv, DOI arXiv:1701.08886
[2]  
Anande Tertsegha J., 2023, International Journal of Computers and Applications, P297, DOI [10.1080/1206212x.2023.2191072, 10.1080/1206212X.2023.2191072]
[3]   Synthetic Energy Data Generation Using Time Variant Generative Adversarial Network [J].
Asre, Shashank ;
Anwar, Adnan .
ELECTRONICS, 2022, 11 (03)
[4]   Simulating Brain Signals: Creating Synthetic EEG Data via Neural-Based Generative Models for Improved SSVEP Classification [J].
Aznan, Nik Khadijah Nik ;
Atapour-Abarghouei, Amir ;
Bonner, Stephen ;
Connolly, Jason D. ;
Al Moubayed, Noura ;
Breckon, Toby P. .
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[5]  
Brownlee J., 2018, Basics of Linear Algebra for Machine Learning: Discover the Mathematical Language of Data in Python
[6]  
Demetriou A, 2023, SN Computer Science, V4, DOI [10.1007/s42979-023-01714-3, DOI 10.1007/S42979-023-01714-3]
[7]  
Esteban C, 2017, Arxiv, DOI arXiv:1706.02633
[8]   Time-Series Generative Adversarial Network Approach of Deep Learning Improves Seizure Detection From the Human Thalamic SEEG [J].
Ganti, Bhargava ;
Chaitanya, Ganne ;
Balamurugan, Ridhanya Sree ;
Nagaraj, Nithin ;
Balasubramanian, Karthi ;
Pati, Sandipan .
FRONTIERS IN NEUROLOGY, 2022, 13
[9]   Generative Adversarial Networks [J].
Goodfellow, Ian ;
Pouget-Abadie, Jean ;
Mirza, Mehdi ;
Xu, Bing ;
Warde-Farley, David ;
Ozair, Sherjil ;
Courville, Aaron ;
Bengio, Yoshua .
COMMUNICATIONS OF THE ACM, 2020, 63 (11) :139-144
[10]  
Hartmann KG, 2018, Arxiv, DOI [arXiv:1806.01875, 10.48550/arXiv.1806.01875]