Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches

被引：0

作者：

Shen, Guodong ^{[1
]}

Ouyang, Yuqi ^{[1
,2
]}

Lu, Junru ^{[1
]}

Yang, Yixuan ^{[1
]}

Sanchez, Victor ^{[1
]}

机构：

[1] Univ Warwick, Comp Sci Dept, Coventry CV4 7AL, England

[2] Sichuan Univ, Coll Comp Sci, Chengdu 610017, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

关键词：

Transformers; Multitasking; Computer vision; Pipelines; Decoding; Bidirectional control; Benchmark testing; Training; Periodic structures; Feature extraction; Video anomaly detection; vision transformer; ConvLSTM; bi-directional structure; single-task; multi-task;

D O I：

10.1109/TIP.2024.3512369

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Despite the prevailing transition from single-task to multi-task approaches in video anomaly detection, we observe that many adopt sub-optimal frameworks for individual proxy tasks. Motivated by this, we contend that optimizing single-task frameworks can advance both single- and multi-task approaches. Accordingly, we leverage middle-frame prediction as the primary proxy task, and introduce an effective hybrid framework designed to generate accurate predictions for normal frames and flawed predictions for abnormal frames. This hybrid framework is built upon a bi-directional structure that seamlessly integrates both vision transformers and ConvLSTMs. Specifically, we utilize this bi-directional structure to fully analyze the temporal dimension by predicting frames in both forward and backward directions, significantly boosting the detection stability. Given the transformer's capacity to model long-range contextual dependencies, we develop a convolutional temporal transformer that efficiently associates feature maps from all context frames to generate attention-based predictions for target frames. Furthermore, we devise a layer-interactive ConvLSTM bridge that facilitates the smooth flow of low-level features across layers and time-steps, thereby strengthening predictions with fine details. Anomalies are eventually identified by scrutinizing the discrepancies between target frames and their corresponding predictions. Several experiments conducted on public benchmarks affirm the efficacy of our hybrid framework, whether used as a standalone single-task approach or integrated as a branch in a multi-task approach. These experiments also underscore the advantages of merging vision transformers and ConvLSTMs for video anomaly detection. The implementation of our hybrid framework is available at https://github.com/SHENGUODONG19951126/ConvTTrans-ConvLSTM.

引用

页码：6865 / 6880

页数：16

共 98 条

[1] Latent Space Autoregression for Novelty Detection [J].

Abati, Davide ;

Porrello, Angelo ;

Calderara, Simone ;

Cucchiara, Rita .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :481-490

[2]

[Anonymous], 2023, [19] Jun. 2023. [Online]. Available: https://www.caranddriver.com/rivian/r1t.

[3] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[4] UniPose: Unified Human Pose Estimation in Single Images and Videos [J].

Artacho, Bruno ;

Savakis, Andreas .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :7033-7042

[5] SSMTL plus plus : Revisiting self-supervised multi-task learning for video anomaly detection [J].

Barbalau, Antonio ;

Ionescu, Radu Tudor ;

Georgescu, Mariana-Iuliana ;

Dueholm, Jacob ;

Ramachandra, Bharathkumar ;

Nasrollahi, Kamal ;

Khan, Fahad Shahbaz ;

Moeslund, Thomas B. ;

Shah, Mubarak .

COMPUTER VISION AND IMAGE UNDERSTANDING, 2023, 229

[6]

Cai RC, 2021, AAAI CONF ARTIF INTE, V35, P938

[7] Advanced hybrid LSTM-transformer architecture for real-time multi-task prediction in engineering systems [J].

Cao, Kangjie ;

Zhang, Ting ;

Huang, Jueqiao .

SCIENTIFIC REPORTS, 2024, 14 (01)

[8] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[9] Clustering Driven Deep Autoencoder for Video Anomaly Detection [J].

Chang, Yunpeng ;

Tu, Zhigang ;

Xie, Wei ;

Yuan, Junsong .

COMPUTER VISION - ECCV 2020, PT XV, 2020, 12360 :329-345

[10] An Examination on Autoencoder Designs for Anomaly Detection in Video Surveillance [J].

Cruz-Esquivel, Ernesto ;

Guzman-Zavaleta, Zobeida J. .

IEEE ACCESS, 2022, 10 :6208-6217

← 1 2 3 4 5 6 7 8 9 10 →