Dual Memory Units with Uncertainty Regulation for Weakly Supervised Video Anomaly Detection

被引：0

作者：

Zhou, Hang ^{[1
]}

Yu, Junqing ^{[1
]}

Yang, Wei ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Wuhan, Peoples R China

来源：

THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3 | 2023年

基金：

中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Learning discriminative features for effectively separating abnormal events from normality is crucial for weakly supervised video anomaly detection (WS-VAD) tasks. Exist-ing approaches, both video and segment-level label oriented, mainly focus on extracting representations for anomaly data while neglecting the implication of normal data. We observe that such a scheme is sub-optimal, i.e., for better distinguishing anomaly one needs to understand what is a normal state, and may yield a higher false alarm rate. To address this issue, we propose an Uncertainty Regulated Dual Memory Units (UR-DMU) model to learn both the representations of normal data and discriminative features of abnormal data. To be specific, inspired by the traditional global and local structure on graph convolutional networks, we introduce a Global and Local Multi-Head Self Attention (GL-MHSA) module for the Transformer network to obtain more expressive embeddings for capturing associations in videos. Then, we use two memory banks, one additional abnormal memory for tackling hard samples, to store and separate abnormal and normal prototypes and maximize the margins between the two representations. Finally, we propose an uncertainty learning scheme to learn the normal data latent space, that is robust to noise from camera switching, object changing, scene transforming, etc. Extensive experiments on XD-Violence and UCF-Crime datasets demonstrate that our method outperforms the state-of-the-art methods by a sizable margin.

引用

页码：3769 / 3777

页数：9

共 41 条

[1]

Carneiro G., 2021, ICCV

[2] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[3] Data Uncertainty Learning in Face Recognition [J].

Chang, Jie ;

Lan, Zhonghao ;

Cheng, Changmao ;

Wei, Yichen .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :5709-5718

[4]

Dosovitskiy A., 2021, ICLR

[5] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[6] MIST: Multiple Instance Self-Training Framework for Video Anomaly Detection [J].

Feng, Jia-Chang ;

Hong, Fa-Ting ;

Zheng, Wei-Shi .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :14004-14013

[7]

Ge Y, 2022, arXiv

[8] Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection [J].

Gong, Dong ;

Liu, Lingqiao ;

Le, Vuong ;

Saha, Budhaditya ;

Mansour, Moussa Reda ;

Venkatesh, Svetha ;

van den Hengel, Anton .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1705-1714

[9]

Hu Hongsheng, 2021, ARXIV

[10]

Ilse M, 2018, PR MACH LEARN RES, V80

← 1 2 3 4 5 →