Enhancing Video Anomaly Detection Using a Transformer Spatiotemporal Attention Unsupervised Framework for Large Datasets

被引:2
|
作者
Habeb, Mohamed H. [1 ]
Salama, May [1 ]
Elrefaei, Lamiaa A. [1 ]
机构
[1] Benha Univ, Fac Engn Shoubra, Elect Engn Dept, Cairo 11629, Egypt
关键词
video anomaly detection; unsupervised learning; spatiotemporal modeling; large datasets; LOCALIZATION; RECOGNITION; HISTOGRAMS; EXTRACTION;
D O I
10.3390/a17070286
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This work introduces an unsupervised framework for video anomaly detection, leveraging a hybrid deep learning model that combines a vision transformer (ViT) with a convolutional spatiotemporal relationship (STR) attention block. The proposed model addresses the challenges of anomaly detection in video surveillance by capturing both local and global relationships within video frames, a task that traditional convolutional neural networks (CNNs) often struggle with due to their localized field of view. We have utilized a pre-trained ViT as an encoder for feature extraction, which is then processed by the STR attention block to enhance the detection of spatiotemporal relationships among objects in videos. The novelty of this work is utilizing the ViT with the STR attention to detect video anomalies effectively in large and heterogeneous datasets, an important thing given the diverse environments and scenarios encountered in real-world surveillance. The framework was evaluated on three benchmark datasets, i.e., the UCSD-Ped2, CHUCK Avenue, and ShanghaiTech. This demonstrates the model's superior performance in detecting anomalies compared to state-of-the-art methods, showcasing its potential to significantly enhance automated video surveillance systems by achieving area under the receiver operating characteristic curve (AUC ROC) values of 95.6, 86.8, and 82.1. To show the effectiveness of the proposed framework in detecting anomalies in extra-large datasets, we trained the model on a subset of the huge contemporary CHAD dataset that contains over 1 million frames, achieving AUC ROC values of 71.8 and 64.2 for CHAD-Cam 1 and CHAD-Cam 2, respectively, which outperforms the state-of-the-art techniques.
引用
收藏
页数:31
相关论文
共 50 条
  • [31] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
    Sun, Xiaohu
    Chen, Jinyi
    Shen, Xulin
    Li, Hongjun
    STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
  • [32] Preprocessing and Framework for Unsupervised Anomaly Detection in IoT: Work on Progress
    Kurniabudi
    Purnama, Benni
    Sharipuddin
    Stiawan, Deris
    Darmawijoyo
    Budiarto, Rahmat
    2018 INTERNATIONAL CONFERENCE ON ELECTRICAL ENGINEERING AND COMPUTER SCIENCE (ICECOS), 2018, : 345 - 350
  • [33] LogAttn: Unsupervised Log Anomaly Detection with an AutoEncoder Based Attention Mechanism
    Zhang, Linming
    Li, Wenzhong
    Zhang, Zhijie
    Lu, Qingning
    Hou, Ce
    Hu, Peng
    Gui, Tong
    Lu, Sanglu
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT III, 2021, 12817 : 222 - 235
  • [34] Detection of Salient Events in Large Datasets of Underwater Video
    Gebali, Aleya
    Albu, Alexandra Branzan
    Hoeberechts, Maia
    2012 OCEANS, 2012,
  • [35] Enhancing Critical Infrastructure Security: Unsupervised Learning Approaches for Anomaly Detection
    Pinto, Andrea
    Herrera, Luis-Carlos
    Donoso, Yezid
    Gutierrez, Jairo A.
    INTERNATIONAL JOURNAL OF COMPUTATIONAL INTELLIGENCE SYSTEMS, 2024, 17 (01)
  • [36] Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection
    Feng, Xinyang
    Song, Dongjin
    Chen, Yuncong
    Chen, Zhengzhang
    Ni, Jingchao
    Chen, Haifeng
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 5546 - 5554
  • [37] An Unsupervised Framework for Online Spatiotemporal Detection of Activities of Daily Living by Hierarchical Activity Models
    Negin, Farhood
    Bremond, Francois
    SENSORS, 2019, 19 (19)
  • [38] Memory-guided representation matching for unsupervised video anomaly detection
    Tao, Yiran
    Hu, Yaosi
    Chen, Zhenzhong
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2024, 101
  • [39] Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders
    Deepak, K.
    Srivathsan, G.
    Roshan, S.
    Chandrakala, S.
    CIRCUITS SYSTEMS AND SIGNAL PROCESSING, 2021, 40 (03) : 1333 - 1349
  • [40] Deep Multi-view Representation Learning for Video Anomaly Detection Using Spatiotemporal Autoencoders
    K. Deepak
    G. Srivathsan
    S. Roshan
    S. Chandrakala
    Circuits, Systems, and Signal Processing, 2021, 40 : 1333 - 1349