Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

被引：8

作者：

Asad, Mujtaba ^{[1
]}

Jiang, He ^{[1
]}

Yang, Jie ^{[1
]}

Tu, Enmei ^{[1
]}

Malik, Aftab A. ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China

[2] Lahore Garrison Univ, Dept Software Engn, Lahore 54810, Pakistan

来源：

INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE | 2022年 / 36卷 / 01期

关键词：

Violence detection; autonomous video surveillance; multi-layer feature fusion; spatio-temporal attention; RECOGNITION; NETWORKS;

D O I：

10.1142/S0218001422550023

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.

引用

页数：25

共 32 条

[21] LARGE-SCALE SPATIO-TEMPORAL ATTENTION BASED ENTROPY MODEL FOR POINT CLOUD COMPRESSION
Song, Rui
Fu, Chunyang
Liu, Shan
Li, Ge
2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2003 - 2008
[22] Spatio-temporal attention modules in orientation-magnitude-response guided multi-stream CNNs for human action recognition
Khezerlou, Fatemeh
Baradarani, Aryaz
Balafar, Mohammad Ali
Maev, Roman Gr.
IET IMAGE PROCESSING, 2024, 18 (09) : 2372 - 2388
[23] Research on person re-identification based on multi-level attention model
Wei, Dan
Liang, Danyang
Wu, Longfei
Wang, Xiaolan
Jiang, Lei
Luo, Suyun
MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (40) : 87459 - 87477
[24] Intelligent Target Detection in Synthetic Aperture Radar Images Based on Multi-Level Fusion
Liu, Qiaoyu
Ye, Ziqi
Zhu, Chenxiang
Ouyang, Dongxu
Gu, Dandan
Wang, Haipeng
REMOTE SENSING, 2025, 17 (01)
[25] A novel spatio-temporal attention-based bidirectional LSTM model for moisture content prediction in drying process
Zhang, Lei
Ren, Guofeng
Du, Jinsong
Li, Shanlian
Li, Yinhua
Xu, Dayong
DRYING TECHNOLOGY, 2024, 42 (14) : 2122 - 2136
[26] Temporal action detection based on two-stream You Only Look Once network for elderly care service robot
Wang, Ke
Li, Xuejing
Yang, Jianhua
Wu, Jun
Li, Ruifeng
INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2021, 18 (04)
[27] Fault detection of offshore wind turbine gearboxes based on deep adaptive networks via considering Spatio-temporal fusion
Zhu, Yongchao
Zhu, Caichao
Tan, Jianjun
Song, Chaosheng
Chen, Dingliang
Zheng, Jie
RENEWABLE ENERGY, 2022, 200 : 1023 - 1036
[28] Video anomaly detection based on multi-scale optical flow spatio-temporal enhancement and normality mining
He, Qiang
Shi, Ruinian
Chen, Linlin
Huo, Lianzhi
INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 1873 - 1888
[29] Imbalanced learning for wind turbine blade icing detection via spatio-temporal attention model with a self-adaptive weight loss function
Jiang, Guoqian
Yue, Ruxu
He, Qun
Xie, Ping
Li, Xiaoli
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
[30] A multi-level damage assessment model based on change detection technology in remote sensing images
Han, Dongzhe
Yang, Guang
Lu, Wangze
Huang, Meng
Liu, Shuai
NATURAL HAZARDS, 2024, : 7367 - 7388

← 1 2 3 4 →