Inter-patch spatio-temporal relation prediction for video anomaly detection

被引:0
|
作者
Hao Shen [1 ]
Lu Shi [2 ]
Linna Zhang [3 ]
Wanru Xu [1 ]
Yigang Cen [2 ]
Gaoyun An [3 ]
机构
[1] Beijing Jiaotong University,State Key Laboratory of Advanced Rail Autonomous Operation
[2] Beijing Jiaotong University,The School of Computer Science and Technology
[3] Beijing Jiaotong University,Visual Intellgence +X International Cooperation Joint Laboratory of MOE
[4] Guizhou University,School of Mechanical Engineering
关键词
Video anomaly detection; Self-supervised learning; Pretext task;
D O I
10.1007/s11760-025-04156-x
中图分类号
学科分类号
摘要
Video anomaly detection (VAD), aiming to identify abnormalities within a specific context and timeframe, is crucial for intelligent video surveillance systems. While recent deep learning-based VAD models have shown promising results by generating high-resolution frames, they often lack competence in preserving detailed spatial and temporal coherence in video frames. To tackle this issue, we propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task. Specifically, we introduce a two-branch vision transformer network designed to capture deep visual features of video frames, which can address spatial and temporal dimensions responsible for modeling appearance and motion patterns, respectively. The inter-patch relationship in each dimension is decoupled into inter-patch similarity and the order information of each patch. To mitigate memory consumption, we convert the order information prediction task into a multi-label learning problem, and the inter-patch similarity prediction task into a inter-patch distance matrix regression problem. Comprehensive experiments demonstrate the effectiveness of our method, surpassing pixel-generation-based methods by a significant margin across three public benchmarks. Additionally, our approach outperforms other self-supervised learning-based methods.
引用
收藏
相关论文
共 50 条
  • [21] DAST-Net: Dense visual attention augmented spatio-temporal network for unsupervised video anomaly detection
    Kommanduri, Rangachary
    Ghorai, Mrinmoy
    NEUROCOMPUTING, 2024, 579
  • [22] Video anomaly detection based on cross-frame prediction mechanism and spatio-temporal memory-enhanced pseudo-3D encoder
    Wen, Xiaopeng
    Lai, Huicheng
    Gao, Guxue
    Xiao, Yang
    Wang, Tongguan
    Jia, Zhenhong
    Wang, Liejun
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 126
  • [23] Spatio-Temporal Catcher: a Self-Supervised Transformer for Deepfake Video Detection
    Li, Maosen
    Li, Xurong
    Yu, Kun
    Deng, Cheng
    Huang, Heng
    Mao, Feng
    Xue, Hui
    Li, Minghao
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 8707 - 8718
  • [24] Video Anomaly Detection Based on Optical Flow Feature Enhanced Spatio-Temporal Feature Network FusionNet-LSTM-G
    Song, Jun-Fang
    Zhao, Hai-Li
    Wen, Duo-Yang
    Xu, Xiao-Yu
    IEEE ACCESS, 2022, 10 : 130314 - 130325
  • [25] Video representation learning by identifying spatio-temporal transformations
    Sheng Geng
    Shimin Zhao
    Hu Liu
    Applied Intelligence, 2022, 52 : 6613 - 6622
  • [26] Video representation learning by identifying spatio-temporal transformations
    Geng, Sheng
    Zhao, Shimin
    Liu, Hu
    APPLIED INTELLIGENCE, 2022, 52 (06) : 6613 - 6622
  • [27] Unsupervised anomalous event detection in videos using spatio-temporal inter-fused autoencoder
    Nazia Aslam
    Maheshkumar H Kolekar
    Multimedia Tools and Applications, 2022, 81 : 42457 - 42482
  • [28] Unsupervised anomalous event detection in videos using spatio-temporal inter-fused autoencoder
    Aslam, Nazia
    Kolekar, Maheshkumar H.
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (29) : 42457 - 42482
  • [29] Spatio-temporal context analysis within video volumes for anomalous-event detection and localization
    Li, Nannan
    Wu, Xinyu
    Xu, Dan
    Guo, Huiwen
    Feng, Wei
    NEUROCOMPUTING, 2015, 155 : 309 - 319
  • [30] STAD-AI: Spatio-Temporal Anomaly Detection in Videos with Attentive Dual-Stage Integration
    Kommanduri, Rangachary
    Ghorai, Mrinmoy
    NEUROCOMPUTING, 2025, 634