Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

被引：8

作者：

Asad, Mujtaba ^{[1
]}

Jiang, He ^{[1
]}

Yang, Jie ^{[1
]}

Tu, Enmei ^{[1
]}

Malik, Aftab A. ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China

[2] Lahore Garrison Univ, Dept Software Engn, Lahore 54810, Pakistan

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2022年 / 36卷 / 01期

关键词：

Violence detection; autonomous video surveillance; multi-layer feature fusion; spatio-temporal attention; RECOGNITION; NETWORKS;

D O I：

10.1142/S0218001422550023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.

引用

页数：25

共 32 条

[1] A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs
Mahmoodi, Javad
Nezamabadi-pour, Hossein
PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)
[2] A spatio-temporal attention fusion model for students behaviour recognition
Wang, Xiaoli
EAI ENDORSED TRANSACTIONS ON SCALABLE INFORMATION SYSTEMS, 2022, 9 (34)
[3] Stream-Flow Forecasting Based on Dynamic Spatio-Temporal Attention
Feng, Jun
Yan, Le
Hang, Tingting
IEEE ACCESS, 2019, 7 : 134754 - 134762
[4] Violence Detection Based on Spatio-Temporal Feature and Fisher Vector
Cai, Huangkai
Jiang, He
Huang, Xiaolin
Yang, Jie
He, Xiangjian
PATTERN RECOGNITION AND COMPUTER VISION (PRCV 2018), PT I, 2018, 11256 : 180 - 190
[5] A Violence Detection Approach Based on Spatio-temporal Hypergraph Transition
Huang, Jingjia
Li, Ge
Li, Nannan
Wang, Ronggang
Wang, Wenmin
COMPUTER ANALYSIS OF IMAGES AND PATTERNS: 17TH INTERNATIONAL CONFERENCE, CAIP 2017, PT II, 2017, 10425 : 218 - 229
[6] BiFAT: Bilateral Filtering and Attention Mechanisms in a Two-Stream Model for Deepfake Detection
Zhang, Lei
Yi, Ceyuan
Liu, Liang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT II, 2024, 15017 : 231 - 247
[7] Real Time Violence Detection Based on Deep Spatio-Temporal Features
Xia, Qing
Zhang, Ping
Wang, JingJing
Tian, Ming
Fei, Chun
BIOMETRIC RECOGNITION, CCBR 2018, 2018, 10996 : 157 - 165
[8] Spatial-Temporal Attention Two-Stream Convolution Neural Network for Smoke Region Detection
Ding, Zhipeng
Zhao, Yaqin
Li, Ao
Zheng, Zhaoxiang
FIRE-SWITZERLAND, 2021, 4 (04):
[9] Attention-Based Multi-Level Feature Fusion for Object Detection in Remote Sensing Images
Dong, Xiaohu
Qin, Yao
Gao, Yinghui
Fu, Ruigang
Liu, Songlin
Ye, Yuanxin
REMOTE SENSING, 2022, 14 (15)
[10] Violence Detection Algorithm Based on Local Spatio-temporal Features and Optical Flow
Lyu, Yao
Yang, Yingyun
2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 307 - 311

← 1 2 3 4 →