Integrating Spatial and Temporal Information for Violent Activity Detection from Video Using Deep Spiking Neural Networks

被引:4
作者
Wang, Xiang [1 ]
Yang, Jie [1 ]
Kasabov, Nikola K. [2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200400, Peoples R China
[2] Auckland Univ Technol, Knowledge Engn & Discovery Res Inst, Auckland 1020, New Zealand
关键词
violence detection; deep learning; spiking neural network; optical flow; spatial and temporal analysis;
D O I
10.3390/s23094532
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Increasing violence in workplaces such as hospitals seriously challenges public safety. However, it is time- and labor-consuming to visually monitor masses of video data in real time. Therefore, automatic and timely violent activity detection from videos is vital, especially for small monitoring systems. This paper proposes a two-stream deep learning architecture for video violent activity detection named SpikeConvFlowNet. First, RGB frames and their optical flow data are used as inputs for each stream to extract the spatiotemporal features of videos. After that, the spatiotemporal features from the two streams are concatenated and fed to the classifier for the final decision. Each stream utilizes a supervised neural network consisting of multiple convolutional spiking and pooling layers. Convolutional layers are used to extract high-quality spatial features within frames, and spiking neurons can efficiently extract temporal features across frames by remembering historical information. The spiking neuron-based optical flow can strengthen the capability of extracting critical motion information. This method combines their advantages to enhance the performance and efficiency for recognizing violent actions. The experimental results on public datasets demonstrate that, compared with the latest methods, this approach greatly reduces parameters and achieves higher inference efficiency with limited accuracy loss. It is a potential solution for applications in embedded devices that provide low computing power but require fast processing speeds.
引用
收藏
页数:17
相关论文
共 49 条
[11]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[12]   Violence detection using Oriented VIolent Flows [J].
Gao, Yuan ;
Liu, Hong ;
Sun, Xiaohu ;
Wang, Can ;
Liu, Yi .
IMAGE AND VISION COMPUTING, 2016, 48-49 :37-41
[13]   Workplace violence among healthcare workers during COVID-19 pandemic in a Jordanian governmental hospital: the tip of the iceberg [J].
Ghareeb, Nanees S. ;
El-Shafei, Dalia A. ;
Eladl, Afaf M. .
ENVIRONMENTAL SCIENCE AND POLLUTION RESEARCH, 2021, 28 (43) :61441-61449
[14]  
Ghosh-Dastidar S, 2009, ADV INTEL SOFT COMPU, V61, P167
[15]  
Goceri E., 2020, 14 INT C COMP GRAPH, P1, DOI DOI 10.33965/CGV2020_202011C031
[16]  
Hassner T., 2012, 2012 IEEE COMP SOC C, P1, DOI [DOI 10.1109/CVPRW.2012.6239348, 10.1109/CVPRW.2012.6239348]
[17]  
Jain Aayush, 2020, 2020 Third International Conference on Smart Systems and Inventive Technology (ICSSIT), P826, DOI 10.1109/ICSSIT48917.2020.9214153
[18]   TWO-PATHWAY TRANSFORMER NETWORK FOR VIDEO ACTION RECOGNITION [J].
Jiang, Bo ;
Yu, Jiahong ;
Zhou, Lei ;
Wu, Kailin ;
Yang, Yang .
2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, :1089-1093
[19]  
Kale Kiran, 2015, 2015 4th International Conference on Reliability, Infocom Technologies and Optimization (ICRITO) (Trends and Future Directions), P1, DOI 10.1109/ICRITO.2015.7359323
[20]  
Kasabov N.K., 2019, TIME SPACE SPIKING N, P339