Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

被引:2
|
作者
Habeb, Mohamed H. [1 ]
Salama, May [1 ]
Elrefaei, Lamiaa A. [1 ]
机构
[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 11629, Egypt
关键词
video anomaly detection; unsupervised learning; spatiotemporal modeling; large datasets; LOCALIZATION; RECOGNITION; HISTOGRAMS; EXTRACTION;
D O I
10.3390/a17070286
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model's superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques.
引用
收藏
页数:31
相关论文
共 50 条
  • [21] Unsupervised video anomaly detection in UAVs: a new approach based on learning and inference
    Liu, Gang
    Shu, Lisheng
    Yang, Yuhui
    Jin, Chen
    FRONTIERS IN SUSTAINABLE CITIES, 2023, 5
  • [22] Learning a multi-cluster memory prototype for unsupervised video anomaly detection
    Wu, Yuntao
    Zeng, Kun
    Li, Zuoyong
    Peng, Zhonghua
    Chen, Xiaobo
    Hu, Rong
    INFORMATION SCIENCES, 2025, 686
  • [23] Enhancing Unsupervised Anomaly Detection With Score-Guided Network
    Huang, Zongyuan
    Zhang, Baohua
    Hu, Guoqiang
    Li, Longyuan
    Xu, Yanyan
    Jin, Yaohui
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14754 - 14769
  • [24] Enhancing video anomaly detection with learnable memory network: A new approach to memory-based auto-encoders
    Wang, Zhiqiang
    Gu, Xiaojing
    Gu, Xingsheng
    Hu, Jingyu
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2024, 241
  • [25] A Novel Unsupervised Video Anomaly Detection Framework Based on Optical Flow Reconstruction and Erased Frame Prediction
    Huang, Heqing
    Zhao, Bing
    Gao, Fei
    Chen, Penghui
    Wang, Jun
    Hussain, Amir
    SENSORS, 2023, 23 (10)
  • [26] Spatial-temporal graph attention network for video anomaly detection
    Chen, Haoyang
    Mei, Xue
    Ma, Zhiyuan
    Wu, Xinhong
    Wei, Yachuan
    IMAGE AND VISION COMPUTING, 2023, 131
  • [27] Unsupervised Anomaly Detection and Localization Based on Deep Spatiotemporal Translation Network
    Ganokratanaa, Thittaporn
    Aramvith, Supavadee
    Sebe, Nicu
    IEEE ACCESS, 2020, 8 : 50312 - 50329
  • [28] Learning Anomalies with Normality Prior for Unsupervised Video Anomaly Detection
    Shi, Haoyue
    Wang, Le
    Zhou, Sanping
    Hua, Gang
    Tang, Wei
    COMPUTER VISION - ECCV 2024, PT VI, 2025, 15064 : 163 - 180
  • [29] Transformer Based Sptial-Temporal Extraction Model for Video Anomaly Detection
    Wang, Zhiqiang
    Gu, Xiaojing
    Gu, Xingsheng
    2024 8TH INTERNATIONAL CONFERENCE ON ROBOTICS, CONTROL AND AUTOMATION, ICRCA 2024, 2024, : 370 - 374
  • [30] Spatiotemporal consistency-enhanced network for video anomaly detection
    Hao, Yi
    Li, Jie
    Wang, Nannan
    Wang, Xiaoyu
    Gao, Xinbo
    PATTERN RECOGNITION, 2022, 121