Enhancing data efficiency for autonomous vehicles: Using data sketches for detecting driving anomalies

被引:1
作者
Indah, Debbie Aisiana [1 ]
Mwakalonge, Judith [1 ]
Comert, Gurcan [2 ]
Siuhi, Saidi [1 ]
机构
[1] South Carolina State Univ, Dept Engn, 300 Coll Ave, Orangeburg, SC 29117 USA
[2] Benedict Coll, Dept Comp Sci & Engn, 1600 Harden St, Columbia, SC 29204 USA
来源
MACHINE LEARNING WITH APPLICATIONS | 2024年 / 15卷
基金
美国国家科学基金会;
关键词
Autonomous vehicles; Data sketches; Reservoir sampling sketches; Big data; Driving anomaly detection; BEHAVIOR; MODEL;
D O I
10.1016/j.mlwa.2024.100530
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Machine learning models for near collision detection in autonomous vehicles promise enhanced predictive power. However, training on these large datasets presents storage and computational challenges, particularly when operated on conventional computing systems. This paper addresses the problem of training anomaly detection models from large-scale vehicle trajectory datasets and adopts a reservoir sampling-based data sketching technique. Predetermined subset sizes ranging from 0.4% to 100% of the original data are utilized, A single-pass reservoir sampling algorithm is then applied to construct these data subsets efficiently. Subsequently, a Support Vector Machine (SVM) model is trained on these subsets, and its performance is assessed by various metrics, including accuracy, precision, recall, and F1-score. Experimental outcomes on the HighD dataset, a comprehensive real-world collection of vehicle trajectories, confirm that our approach can achieve robust near-collision detection. With a full dataset, our model achieved an F1-score of 0.9998 for class 0 and 0.9984 for class 1. When the data was reduced to as low as 0.4% of the original size, the F1-score for class 0 remained at 0.9998 and 0.7143 for class 1. This demonstrates a capability to maintain a relatively high performance even with a 99.6% reduction in data size. Moreover, precision and recall values ranged from 71.3% to 0.999 across varying sketch sizes.
引用
收藏
页数:9
相关论文
共 38 条
[1]  
Accenture, 2018, Conquering the data challenge in the race to autonomous vehicles
[2]   Mergeable Summaries [J].
Agarwal, Pankaj K. ;
Cormode, Graham ;
Huang, Zengfeng ;
Phillips, Jeff M. ;
Wei, Zhewei ;
Yi, Ke .
ACM TRANSACTIONS ON DATABASE SYSTEMS, 2013, 38 (04)
[3]  
Agrawal Aman Kumar, 2020, 2020 Fifth International Conference on Research in Computational Intelligence and Communication Networks (ICRCICN), P71, DOI 10.1109/ICRCICN50933.2020.9296156
[4]   Autonomous vehicles: challenges, opportunities, and future implications for transportation policies [J].
Bagloee, Saeed Asadi ;
Tavana, Madjid ;
Asadi, Mohsen ;
Oliver, Tracey .
JOURNAL OF MODERN TRANSPORTATION, 2016, 24 (04) :284-303
[5]   Support vector machines for predictive modeling in heterogeneous catalysis: A comprehensive introduction and overfitting investigation based on two real applications [J].
Baumes, L. A. ;
Serra, J. M. ;
Serna, P. ;
Corma, A. .
JOURNAL OF COMBINATORIAL CHEMISTRY, 2006, 8 (04) :583-596
[6]  
Chakraborty N, 2023, Arxiv, DOI arXiv:2301.03634
[7]   Anomaly Detection: A Survey [J].
Chandola, Varun ;
Banerjee, Arindam ;
Kumar, Vipin .
ACM COMPUTING SURVEYS, 2009, 41 (03)
[8]   Improving road safety with ensemble learning: Detecting driver anomalies using vehicle inbuilt cameras [J].
Chengula, Tumlumbe Juliana ;
Mwakalonge, Judith ;
Comert, Gurcan ;
Siuhi, Saidi .
MACHINE LEARNING WITH APPLICATIONS, 2023, 14
[9]   All-Distances Sketches, Revisited: HIP Estimators for Massive Graphs Analysis [J].
Cohen, Edith .
PODS'14: PROCEEDINGS OF THE 33RD ACM SIGMOD-SIGACT-SIGART SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2014, :88-99
[10]   Synopses for Massive Data: Samples, Histograms, Wavelets, Sketches [J].
Cormode, Graham ;
Garofalakis, Minos ;
Haas, Peter J. ;
Jermaine, Chris .
FOUNDATIONS AND TRENDS IN DATABASES, 2011, 4 (1-3) :1-294