Dance with Self-Attention: A New Look of Conditional Random Fields on Anomaly Detection in Videos

被引：37

作者：

Purwanto, Didik ^{[1
]}

Chen, Yie-Tarng ^{[1
]}

Fang, Wen-Hsien ^{[1
]}

机构：

[1] Natl Taiwan Univ Sci & Technol, Taipei, Taiwan

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.00024

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper proposes a novel weakly supervised approach for anomaly detection, which begins with a relation-aware feature extractor to capture the multi-scale convolutional neural network (CNN) features from a video. Afterwards, self-attention is integrated with conditional random fields (CRFs), the core of the network, to make use of the ability of self-attention in capturing the short-range correlations of the features and the ability of CRFs in learning the interdependencies of these features. Such a framework can learn not only the spatio-temporal interactions among the actors which are important for detecting complex movements, but also their short- and long-term dependencies across frames. Also, to deal with both local and non-local relationships of the features, a new variant of self-attention is developed by taking into consideration a set of cliques with different temporal localities. Moreover, a contrastive multi-instance learning scheme is considered to broaden the gap between the normal and abnormal instances, resulting in more accurate abnormal discrimination. Simulations reveal that the new method provides superior performance to the state-of-the-art works on the widespread UCF-Crime and ShanghaiTech datasets.

引用

页码：173 / 183

页数：11

共 48 条

[1]

ALBA RD, 1973, J MATH SOCIOL, V3, P113, DOI 10.1080/0022250X.1973.9989826

[2] Higher Order Conditional Random Fields in Deep Neural Networks [J].

Arnab, Anurag ;

Jayasumana, Sadeep ;

Zheng, Shuai ;

Torr, Philip H. S. .

COMPUTER VISION - ECCV 2016, PT II, 2016, 9906 :524-540

[3]

Chen T, 2020, PR MACH LEARN RES, V119

[4]

Cheng KW, 2015, PROC CVPR IEEE, P2909, DOI 10.1109/CVPR.2015.7298909

[5] Multi-Context Attention for Human Pose Estimation [J].

Chu, Xiao ;

Yang, Wei ;

Ouyang, Wanli ;

Ma, Cheng ;

Yuille, Alan L. ;

Wang, Xiaogang .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5669-5678

[6] A review of multi-instance learning assumptions [J].

Foulds, James ;

Frank, Eibe .

KNOWLEDGE ENGINEERING REVIEW, 2010, 25 (01) :1-25

[7] Conditional Random Field Enhanced Graph Convolutional Neural Networks [J].

Gao, Hongchang ;

Pei, Jian ;

Huang, Heng .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :276-284

[8] Memorizing Normality to Detect Anomaly: Memory-augmented Deep Autoencoder for Unsupervised Anomaly Detection [J].

Gong, Dong ;

Liu, Lingqiao ;

Le, Vuong ;

Saha, Budhaditya ;

Mansour, Moussa Reda ;

Venkatesh, Svetha ;

van den Hengel, Anton .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :1705-1714

[9]

Goyal Priya, 2017, CORR

[10] Anomaly Event Detection in Security Surveillance Using Two-Stream Based Model [J].

Hao, Wangli ;

Zhang, Ruixian ;

Li, Shancang ;

Li, Junyu ;

Li, Fuzhong ;

Zhao, Shanshan ;

Zhang, Wuping .

SECURITY AND COMMUNICATION NETWORKS, 2020, 2020

← 1 2 3 4 5 →