Inter-patch spatio-temporal relation prediction for video anomaly detection

被引：0

作者：

Hao Shen ^{[1
]}

Lu Shi ^{[2
]}

Linna Zhang ^{[3
]}

Wanru Xu ^{[1
]}

Yigang Cen ^{[2
]}

Gaoyun An ^{[3
]}

机构：

[1] Beijing Jiaotong University,State Key Laboratory of Advanced Rail Autonomous Operation

[2] Beijing Jiaotong University,The School of Computer Science and Technology

[3] Beijing Jiaotong University,Visual Intellgence +X International Cooperation Joint Laboratory of MOE

[4] Guizhou University,School of Mechanical Engineering

来源：

Signal, Image and Video Processing | 2025年 / 19卷 / 7期

关键词：

Video anomaly detection; Self-supervised learning; Pretext task;

D O I：

10.1007/s11760-025-04156-x

中图分类号：

学科分类号：

摘要：

Video anomaly detection (VAD), aiming to identify abnormalities within a specific context and timeframe, is crucial for intelligent video surveillance systems. While recent deep learning-based VAD models have shown promising results by generating high-resolution frames, they often lack competence in preserving detailed spatial and temporal coherence in video frames. To tackle this issue, we propose a self-supervised learning approach for VAD through an inter-patch relationship prediction task. Specifically, we introduce a two-branch vision transformer network designed to capture deep visual features of video frames, which can address spatial and temporal dimensions responsible for modeling appearance and motion patterns, respectively. The inter-patch relationship in each dimension is decoupled into inter-patch similarity and the order information of each patch. To mitigate memory consumption, we convert the order information prediction task into a multi-label learning problem, and the inter-patch similarity prediction task into a inter-patch distance matrix regression problem. Comprehensive experiments demonstrate the effectiveness of our method, surpassing pixel-generation-based methods by a significant margin across three public benchmarks. Additionally, our approach outperforms other self-supervised learning-based methods.

引用

共 50 条

[1] Video anomaly detection with spatio-temporal dissociation
Chang, Yunpeng
Tu, Zhigang
Xie, Wei
Luo, Bin
Zhang, Shifu
Sui, Haigang
Yuan, Junsong
PATTERN RECOGNITION, 2022, 122
[2] Spatio-Temporal AutoEncoder for Video Anomaly Detection
Zhao, Yiru
Deng, Bing
Shen, Chen
Liu, Yao
Lu, Hongtao
Hua, Xian-Sheng
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1933 - 1941
[3] Transformer with Spatio-Temporal Representation for Video Anomaly Detection
Sun, Xiaohu
Chen, Jinyi
Shen, Xulin
Li, Hongjun
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 213 - 222
[4] Spatio-Temporal United Memory for Video Anomaly Detection
Wang, Yunlong
Chen, Mingyi
Li, Jiaxin
Li, Hongjun
STRUCTURAL, SYNTACTIC, AND STATISTICAL PATTERN RECOGNITION, S+SSPR 2022, 2022, 13813 : 84 - 93
[5] VIDEO ANOMALY DETECTION VIA PREDICTION NETWORK WITH ENHANCED SPATIO-TEMPORAL MEMORY EXCHANGE
Shen, Guodong
Ouyang, Yuqi
Sanchez, Victor
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 3728 - 3732
[6] A novel spatio-temporal memory network for video anomaly detection
Li H.
Chen M.
Multimedia Tools and Applications, 2025, 84 (8) : 4603 - 4624
[7] Associative Memory With Spatio-Temporal Enhancement for Video Anomaly Detection
Zhong, Yuanhong
Hu, Yongting
Tang, Panliang
Wang, Heng
IEEE SIGNAL PROCESSING LETTERS, 2023, 30 : 1212 - 1216
[8] Video anomaly detection based on spatio-temporal relationships among objects
Wang, Yang
Liu, Tianying
Zhou, Jiaogen
Guan, Jihong
NEUROCOMPUTING, 2023, 532 : 141 - 151
[9] Normal Spatio-Temporal Information Enhance for Unsupervised Video Anomaly Detection
Wang, Jun
Jia, Di
Huang, Ziqing
Zhang, Miaohui
Ren, Xing
NEURAL PROCESSING LETTERS, 2023, 55 (08) : 10727 - 10745
[10] Normal Spatio-Temporal Information Enhance for Unsupervised Video Anomaly Detection
Jun Wang
Di Jia
Ziqing Huang
Miaohui Zhang
Xing Ren
Neural Processing Letters, 2023, 55 : 10727 - 10745

← 1 2 3 4 5 →