Crash data augmentation using variational autoencoder

被引:115
作者
Islam, Zubayer [1 ]
Abdel-Aty, Mohamed [1 ]
Cai, Qing [1 ]
Yuan, Jinghui [1 ]
机构
[1] Univ Cent Florida, Dept Civil Environm & Construct Engn, Orlando, FL 32816 USA
关键词
Variational autoencoder; Data augmentation; Crash prediction; RISK PREDICTION; SMOTE;
D O I
10.1016/j.aap.2020.105950
中图分类号
TB18 [人体工程学];
学科分类号
1201 ;
摘要
In this paper, we present a data augmentation technique to reproduce crash data. The dataset comprising crash and non-crash events are extremely imbalanced. For instance, the dataset used in this paper consists of only 625 crash events for over 6.5 million non-crash events. Thus, learning algorithms tend to perform poorly on these datasets. We have used variational autoencoder to encode all the events into a latent space. After training, the model could successfully separate crash and non-crash events. To generate data, we sampled from the latent space containing crash data. The generated data was compared with the real data from different statistical aspects. t-Test, Levene-test and Kolmogrove Smirnov test showed that the generated data was statistically similar to the real data. It was also compared to some of the minority oversampling techniques like SMOTE and ADASYN as well as the GAN framework for generating data. Crash prediction models based on Logistic Regression (LR), Support Vector Machine (SVM) and Artificial Neural Network (ANN) were used to compare the generated data from the different oversampling techniques. Overall, variational autoencoder (VAE) showed excellent results compared to the other data augmentation methods. Specificity is improved by 8% and 4% for VAE-LR and VAE-SVM respectively when compared to SMOTE while the sensitivity is improved by 6% and 5% when compared to ADASYN. Moreover, VAE generated data also helps to overcome the overfitting problem in SMOTE and ADASYN since there is flexibility in choosing the decision boundary.
引用
收藏
页数:13
相关论文
共 54 条
[1]   Predicting freeway crashes from loop detector data by matched case-control logistic regression [J].
Abdel-Aty, M ;
Uddin, N ;
Pande, A ;
Abdalla, MF ;
Hsia, L .
STATISTICAL METHODS AND SAFETY DATA ANALYSIS AND EVALUATION, 2004, (1897) :88-95
[2]   A real-time crash prediction fusion framework: An imbalance- aware strategy for collision avoidance systems [J].
Abou Elassad, Zouhair Elamrani ;
Mousannif, Hajar ;
Al Moatassime, Hassan .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 118
[3]   The Viability of Using Automatic Vehicle Identification Data for Real-Time Crash Prediction [J].
Ahmed, Mohamed M. ;
Abdel-Aty, Mohamed A. .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2012, 13 (02) :459-468
[4]  
[Anonymous], 2016, P 29 C NEUR INF PROC
[5]  
[Anonymous], 2015, Technical Report
[6]  
[Anonymous], 2017, PMLR
[7]   A spatiotemporal deep learning approach for citywide short-term crash risk prediction with multi-source data [J].
Bao, Jie ;
Liu, Pan ;
Ukkusuri, Satish V. .
ACCIDENT ANALYSIS AND PREVENTION, 2019, 122 :239-254
[8]   The importance of flow composition in real-time crash prediction [J].
Basso, Franco ;
Basso, Leonardo J. ;
Pezoa, Raul .
ACCIDENT ANALYSIS AND PREVENTION, 2020, 137
[9]   Real-time crash prediction in an urban expressway using disaggregated data [J].
Basso, Franco ;
Basso, Leonardo J. ;
Bravo, Francisco ;
Pezoa, Raul .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2018, 86 :202-219
[10]   A variational autoencoder solution for road traffic forecasting systems: Missing data imputation, dimension reduction, model selection and anomaly detection [J].
Boquet, Guillem ;
Morell, Antoni ;
Serrano, Javier ;
Lopez Vicario, Jose .
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2020, 115