Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

被引：2

作者：

Habeb, Mohamed H. ^{[1
]}

Salama, May ^{[1
]}

Elrefaei, Lamiaa A. ^{[1
]}

机构：

[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 11629, Egypt

来源：

ALGORITHMS | 2024年 / 17卷 / 07期

关键词：

video anomaly detection; unsupervised learning; spatiotemporal modeling; large datasets; LOCALIZATION; RECOGNITION; HISTOGRAMS; EXTRACTION;

D O I：

10.3390/a17070286

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model's superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques.

引用

页数：31

共 50 条

[31] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
Sun, Xiaohu
Chen, Jinyi
Shen, Xulin
Li, Hongjun
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
[32] Preprocessing and Framework for Unsupervised Anomaly Detection in IoT: Work on Progress
Kurniabudi
Purnama, Benni
Sharipuddin
Stiawan, Deris
Darmawijoyo
Budiarto, Rahmat
2018 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICECOS), 2018, : 345 - 350
[33] LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism
Zhang, Linming
Li, Wenzhong
Zhang, Zhijie
Lu, Qingning
Hou, Ce
Hu, Peng
Gui, Tong
Lu, Sanglu
KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 222 - 235
[34] Detection of Salient Events in Large Datasets of Underwater Video
Gebali, Aleya
Albu, Alexandra Branzan
Hoeberechts, Maia
2012 OCEANS, 2012,
[35] Enhancing Critical Infrastructure Security: Unsupervised Learning Approaches for Anomaly Detection
Pinto, Andrea
Herrera, Luis-Carlos
Donoso, Yezid
Gutierrez, Jairo A.
INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
[36] Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
Feng, Xinyang
Song, Dongjin
Chen, Yuncong
Chen, Zhengzhang
Ni, Jingchao
Chen, Haifeng
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5546 - 5554
[37] An Unsupervised Framework for Online Spatiotemporal Detection of Activities of Daily Living by Hierarchical Activity Models
Negin, Farhood
Bremond, Francois
SENSORS, 2019, 19 (19)
[38] Memory-guided representation matching for unsupervised video anomaly detection
Tao, Yiran
Hu, Yaosi
Chen, Zhenzhong
JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
[39] Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders
Deepak, K.
Srivathsan, G.
Roshan, S.
Chandrakala, S.
CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (03) : 1333 - 1349
[40] Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders
K. Deepak
G. Srivathsan
S. Roshan
S. Chandrakala
Circuits, Systems, and Signal Processing, 2021, 40 : 1333 - 1349

← 1 2 3 4 5 →