Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

被引：2

作者：

Habeb, Mohamed H. ^{[1
]}

Salama, May ^{[1
]}

Elrefaei, Lamiaa A. ^{[1
]}

机构：

[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 11629, Egypt

来源：

ALGORITHMS | 2024年 / 17卷 / 07期

关键词：

video anomaly detection; unsupervised learning; spatiotemporal modeling; large datasets; LOCALIZATION; RECOGNITION; HISTOGRAMS; EXTRACTION;

D O I：

10.3390/a17070286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model's superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques.

引用

页数：31

共 50 条

[1] AONet: Attention network with optional activation for unsupervised video anomaly detection
Rakhmonov, Akhrorjon Akhmadjon Ugli
Subramanian, Barathi
Varnousefaderani, Bahar Amirian
Kim, Jeonghong
ETRI JOURNAL, 2024, 46 (05) : 890 - 903
[2] Robust Unsupervised Video Anomaly Detection by Multipath Frame Prediction
Wang, Xuanzhao
Che, Zhengping
Jiang, Bo
Xiao, Ning
Yang, Ke
Tang, Jian
Ye, Jieping
Wang, Jingyu
Qi, Qi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (06) : 2301 - 2312
[3] Attention-based misaligned spatiotemporal auto-encoder for video anomaly detection
Yang, Haiyan
Liu, Shuning
Wu, Mingxuan
Chen, Hongbin
Zeng, Delu
SIGNAL IMAGE AND VIDEO PROCESSING, 2024, 18 (SUPPL 1) : 285 - 297
[4] Spatiotemporal Anomaly Detection Using Deep Learning for Real-Time Video Surveillance
Nawaratne, Rashmika
Alahakoon, Damminda
De Silva, Daswin
Yu, Xinghuo
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2020, 16 (01) : 393 - 402
[5] Spatiotemporal Representation Learning for Video Anomaly Detection
Li, Zhaoyan
Li, Yaoshun
Gao, Zhisheng
IEEE ACCESS, 2020, 8 (08): : 25531 - 25542
[6] AEMNet: Unsupervised Video Anomaly Detection Method Based on Attention-Enhanced Memory Networks
Zhang, Linliang
Yan, Lianshan
Peng, Shouxin
Pan, Lihu
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2024, 38 (08)
[7] AutoAD: an Automated Framework for Unsupervised Anomaly Detection
Putina, Andrian
Bahri, Maroua
Salutari, Flavia
Sozio, Mauro
2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 106 - 115
[8] CVTGAD: Simplified Transformer with Cross-View Attention for Unsupervised Graph-Level Anomaly Detection
Li, Jindong
Xing, Qianli
Wang, Qi
Chang, Yi
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES: RESEARCH TRACK, ECML PKDD 2023, PT I, 2023, 14169 : 185 - 200
[9] A New Unsupervised Video Anomaly Detection Using Multi-Scale Feature Memorization and Multipath Temporal Information Prediction
Taghinezhad, Neda
Yazdi, Mehran
IEEE ACCESS, 2023, 11 : 9295 - 9310
[10] DAST-Net: Dense visual attention augmented spatio-temporal network for unsupervised video anomaly detection
Kommanduri, Rangachary
Ghorai, Mrinmoy
NEUROCOMPUTING, 2024, 579

← 1 2 3 4 5 →