Multi-Level Two-Stream Fusion-Based Spatio-Temporal Attention Model for Violence Detection and Localization

被引:8
作者
Asad, Mujtaba [1 ]
Jiang, He [1 ]
Yang, Jie [1 ]
Tu, Enmei [1 ]
Malik, Aftab A. [2 ]
机构
[1] Shanghai Jiao Tong Univ, Inst Image Proc & Pattern Recognit, Shanghai 200240, Peoples R China
[2] Lahore Garrison Univ, Dept Software Engn, Lahore 54810, Pakistan
关键词
Violence detection; autonomous video surveillance; multi-layer feature fusion; spatio-temporal attention; RECOGNITION; NETWORKS;
D O I
10.1142/S0218001422550023
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Detection of violent human behavior is necessary for public safety and monitoring. However, it demands constant human observation and attention in human-based surveillance systems, which is a challenging task. Autonomous detection of violent human behavior is therefore essential for continuous uninterrupted video surveillance. In this paper, we propose a novel method for violence detection and localization in videos using the fusion of spatio-temporal features and attention model. The model consists of Fusion Convolutional Neural Network (Fusion-CNN), spatio-temporal attention modules and Bi-directional Convolutional LSTMs (BiConvLSTM). The Fusion-CNN learns both spatial and temporal features by combining multi-level inter-layer features from both RGB and Optical flow input frames. The spatial attention module is used to generate an importance mask to focus on the most important areas of the image frame. The temporal attention part, which is based on BiConvLSTM, identifies the most significant video frames which are related to violent activity. The proposed model can also localize and discriminate prominent regions in both spatial and temporal domains, given the weakly supervised training with only video-level classification labels. Experimental results evaluated on different publicly available benchmarking datasets show the superior performance of the proposed model in comparison with the existing methods. Our model achieves the improved accuracies (ACC) of 89.1%, 99.1% and 98.15% for RWF-2000, HockeyFight and Crowd-Violence datasets, respectively. For CCTV-FIGHTS dataset, we choose the mean average precision (mAp) performance metric and our model obtained 80.7% mAp.
引用
收藏
页数:25
相关论文
共 32 条
  • [21] LARGE-SCALE SPATIO-TEMPORAL ATTENTION BASED ENTROPY MODEL FOR POINT CLOUD COMPRESSION
    Song, Rui
    Fu, Chunyang
    Liu, Shan
    Li, Ge
    2023 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO, ICME, 2023, : 2003 - 2008
  • [22] Spatio-temporal attention modules in orientation-magnitude-response guided multi-stream CNNs for human action recognition
    Khezerlou, Fatemeh
    Baradarani, Aryaz
    Balafar, Mohammad Ali
    Maev, Roman Gr.
    IET IMAGE PROCESSING, 2024, 18 (09) : 2372 - 2388
  • [23] Research on person re-identification based on multi-level attention model
    Wei, Dan
    Liang, Danyang
    Wu, Longfei
    Wang, Xiaolan
    Jiang, Lei
    Luo, Suyun
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (40) : 87459 - 87477
  • [24] Intelligent Target Detection in Synthetic Aperture Radar Images Based on Multi-Level Fusion
    Liu, Qiaoyu
    Ye, Ziqi
    Zhu, Chenxiang
    Ouyang, Dongxu
    Gu, Dandan
    Wang, Haipeng
    REMOTE SENSING, 2025, 17 (01)
  • [25] A novel spatio-temporal attention-based bidirectional LSTM model for moisture content prediction in drying process
    Zhang, Lei
    Ren, Guofeng
    Du, Jinsong
    Li, Shanlian
    Li, Yinhua
    Xu, Dayong
    DRYING TECHNOLOGY, 2024, 42 (14) : 2122 - 2136
  • [26] Temporal action detection based on two-stream You Only Look Once network for elderly care service robot
    Wang, Ke
    Li, Xuejing
    Yang, Jianhua
    Wu, Jun
    Li, Ruifeng
    INTERNATIONAL JOURNAL OF ADVANCED ROBOTIC SYSTEMS, 2021, 18 (04)
  • [27] Fault detection of offshore wind turbine gearboxes based on deep adaptive networks via considering Spatio-temporal fusion
    Zhu, Yongchao
    Zhu, Caichao
    Tan, Jianjun
    Song, Chaosheng
    Chen, Dingliang
    Zheng, Jie
    RENEWABLE ENERGY, 2022, 200 : 1023 - 1036
  • [28] Video anomaly detection based on multi-scale optical flow spatio-temporal enhancement and normality mining
    He, Qiang
    Shi, Ruinian
    Chen, Linlin
    Huo, Lianzhi
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (03) : 1873 - 1888
  • [29] Imbalanced learning for wind turbine blade icing detection via spatio-temporal attention model with a self-adaptive weight loss function
    Jiang, Guoqian
    Yue, Ruxu
    He, Qun
    Xie, Ping
    Li, Xiaoli
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 229
  • [30] A multi-level damage assessment model based on change detection technology in remote sensing images
    Han, Dongzhe
    Yang, Guang
    Lu, Wangze
    Huang, Meng
    Liu, Shuai
    NATURAL HAZARDS, 2024, : 7367 - 7388