Generative Adversarial Networks for Abnormal Event Detection in Videos Based on Self-Attention Mechanism

被引:15
作者
Zhang, Weichao [1 ,2 ]
Wang, Guanjun [1 ,2 ,3 ]
Huang, Mengxing [1 ,2 ]
Wang, Hongyu [1 ,4 ]
Wen, Shaoping [1 ,2 ]
机构
[1] Hainan Univ, State Key Lab Marine Resource Utilizat South Chin, Haikou 570228, Hainan, Peoples R China
[2] Hainan Univ, Dept Informat & Commun Engn, Haikou 570228, Hainan, Peoples R China
[3] Huazhong Univ Sci & Technol, Wuhan Natl Lab Optoelect, Wuhan 430074, Peoples R China
[4] Hainan Univ, Dept Comp & Cyberspace Secur, Haikou 570228, Hainan, Peoples R China
基金
中国国家自然科学基金;
关键词
Streaming media; Event detection; Anomaly detection; Feature extraction; Generative adversarial networks; Generators; Training; Abnormal event detection; generative adversarial networks (GANs); self-attention; video understanding; ANOMALY DETECTION; HISTOGRAMS;
D O I
10.1109/ACCESS.2021.3110798
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unsupervised anomaly detection defines an abnormal event as an event that does not conform to expected behavior. In the field of unsupervised anomaly detection, it is a pioneering work that leverages the difference between a future frame predicted by a generative adversarial network and its ground truth to detect an abnormal event. Based on the work, we improve the ability of video prediction framework to detect abnormal events by enhancing the difference between prediction results for normal and abnormal events. We incorporate super-resolution and self-attention mechanism to design a generative adversarial network. We propose an auto-encoder as a generator, which incorporates dense residual networks and self-attention. Moreover, we propose a new discriminator, which introduces self-attention on the basis of a relativistic discriminator. To predict a future frame with higher quality for normal events, we impose a constraint on the motion in video prediction by fusing optical flow and gradient difference between frames. We also introduce a perception constraint in video prediction to enrich the texture details of a frame. The AUC of our method on CUHK Avenue and Shanghai Tech datasets reaches 89.2% and 75.7% respectively, which is better than most existing methods. In addition, we propose a processing flow that can realize real-time anomaly detection in videos. The average running time of our video prediction framework is 37 frames per second. Among all real-time methods for abnormal event detection in videos, our method is competitive with the state-of-the-art methods.
引用
收藏
页码:124847 / 124860
页数:14
相关论文
共 56 条
[1]  
Cai RC, 2021, AAAI CONF ARTIF INTE, V35, P938
[2]  
Cheng J., 2016, P 2016 C EMPIRICAL M
[3]   Abnormal Event Detection in Videos Using Spatiotemporal Autoencoder [J].
Chong, Yong Shean ;
Tay, Yong Haur .
ADVANCES IN NEURAL NETWORKS, PT II, 2017, 10262 :189-196
[4]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[5]   Human detection using oriented histograms of flow and appearance [J].
Dalal, Navneet ;
Triggs, Bill ;
Schmid, Cordelia .
COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441
[6]   A Discriminative Framework for Anomaly Detection in Large Videos [J].
Del Giorno, Allison ;
Bagnell, J. Andrew ;
Hebert, Martial .
COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 :334-349
[7]   Image Super-Resolution Using Deep Convolutional Networks [J].
Dong, Chao ;
Loy, Chen Change ;
He, Kaiming ;
Tang, Xiaoou .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (02) :295-307
[8]   Dual Discriminator Generative Adversarial Network for Video Anomaly Detection [J].
Dong, Fei ;
Zhang, Yu ;
Nie, Xiushan .
IEEE ACCESS, 2020, 8 (88170-88176) :88170-88176
[9]   FlowNet: Learning Optical Flow with Convolutional Networks [J].
Dosovitskiy, Alexey ;
Fischer, Philipp ;
Ilg, Eddy ;
Haeusser, Philip ;
Hazirbas, Caner ;
Golkov, Vladimir ;
van der Smagt, Patrick ;
Cremers, Daniel ;
Brox, Thomas .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2758-2766
[10]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497