Weakly-Supervised Video Anomaly Detection with MTDA-Net

被引：3

作者：

Wu, Huixin ^{[1
]}

Yang, Mengfan ^{[1
]}

Wei, Fupeng ^{[1
]}

Shi, Ge ^{[1
]}

Jiang, Wei ^{[1
]}

Qiao, Yaqiong ^{[1
]}

Dong, Hangcheng ^{[2
]}

机构：

[1] North China Univ Water Resources & Elect Power, Sch Informat Engn, Zhengzhou 450046, Peoples R China

[2] Harbin Inst Technol, Sch Instrumentat Sci & Engn, Harbin 150001, Peoples R China

来源：

ELECTRONICS | 2023年 / 12卷 / 22期

基金：

中国国家自然科学基金;

关键词：

weakly supervised; temporal modeling; anomaly detection;

D O I：

10.3390/electronics12224623

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Weakly supervised anomalous behavior detection is a popular area at present. Compared to semi-supervised anomalous behavior detection, weakly-supervised learning both eliminates the need to crop videos and solves the problem of semi-supervised learning's difficulty in handling long videos. Previous work has used graph convolution or self-attention mechanisms to model temporal relationships. However, these methods tend to model temporal relationships at a single scale and lack consideration of the aggregation problem for different temporal relationships. In this paper, we propose a weakly supervised anomaly detection framework, MTDA-Net, with emphasis on modeling different temporal relationships and enhanced semantic discrimination. To this end, we construct a new plug-and-play module, MTDA, which uses three branches, Multi-headed Attention (MHA), Temporal Shift (TS), and Dilated Aggregation (DA), to extract different temporal sequences. Specifically, the MHA branch can globally model the video information and project the features into different semantic spaces to enhance the expressiveness and discrimination of the features. The DA branch extracts temporal information of different scales via dilated convolution and captures the temporal features of local regions in the video. The TS branch can fuse the features of adjacent frames on a local scale and enhance the information flow. MTDA-Net can learn the temporal relationships between video segments on different branches and learn powerful video representations based on these relationships. The experimental results on the XD-Violence dataset show that MTDA-Net can significantly improve the detection accuracy of abnormal behaviors.

引用

页数：14

共 45 条

[1] MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation [J].

Abu Farha, Yazan ;

Gall, Juergen .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3570-3579

[2] GANomaly: Semi-supervised Anomaly Detection via Adversarial Training [J].

Akcay, Samet ;

Atapour-Abarghouei, Amir ;

Breckon, Toby P. .

COMPUTER VISION - ACCV 2018, PT III, 2019, 11363 :622-637

[3]

Al-Dhamari A, 2019, TELKOMNIKA (Telecommunication Computing Electronics and Control), V17, P2039, DOI [10.12928/telkomnika.v17i4.12753, DOI 10.12928/TELKOMNIKA.V17I4.12753]

[4]

Antic B, 2011, IEEE I CONF COMP VIS, P2415, DOI 10.1109/ICCV.2011.6126525

[5] ViViT: A Video Vision Transformer [J].

Arnab, Anurag ;

Dehghani, Mostafa ;

Heigold, Georg ;

Sun, Chen ;

Lucic, Mario ;

Schmid, Cordelia .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :6816-6826

[6] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[7] Contrastive Attention for Video Anomaly Detection [J].

Chang, Shuning ;

Li, Yanchao ;

Shen, Shengmei ;

Feng, Jiashi ;

Zhou, Zhiying .

IEEE TRANSACTIONS ON MULTIMEDIA, 2022, 24 :4067-4076

[8] VSD, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation [J].

Demarty, Claire-Helene ;

Penet, Cedric ;

Soleymani, Mohammad ;

Gravier, Guillaume .

MULTIMEDIA TOOLS AND APPLICATIONS, 2015, 74 (17) :7379-7404

[9] X3D: Expanding Architectures for Efficient Video Recognition [J].

Feichtenhofer, Christoph .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :200-210

[10] A Spatio-Temporal Spotting Network with Sliding Windows for Micro-Expression Detection [J].

Fu, Wenwen ;

An, Zhihong ;

Huang, Wendong ;

Sun, Haoran ;

Gong, Wenjuan ;

Gonzalez, Jordi .

ELECTRONICS, 2023, 12 (18)

← 1 2 3 4 5 →