Audiovisual Dependency Attention for Violence Detection in Videos

被引:4
|
作者
Pang, Wenfeng [1 ]
Xie, Wei [1 ]
He, Qianhua [1 ]
Li, Yanxiong [1 ]
Yang, Jichen [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;
D O I
10.1109/TMM.2022.3184533
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.
引用
收藏
页码:4922 / 4932
页数:11
相关论文
共 50 条
  • [31] Violence Video Detection by Discriminative Slow Feature Analysis
    Wang, Kaiye
    Zhang, Zhang
    Wang, Liang
    PATTERN RECOGNITION, 2012, 321 : 137 - +
  • [32] Weakly Supervised Audio-Visual Violence Detection
    Wu, Peng
    Liu, Xiaotao
    Liu, Jing
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 1674 - 1685
  • [33] Violence detection in videos using interest frame extraction and 3D convolutional neural network
    Mahmoodi, Javad
    Nezamabadi-pour, Hossein
    Abbasi-Moghadam, Dariush
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (15) : 20945 - 20961
  • [34] Violence detection in videos using interest frame extraction and 3D convolutional neural network
    Javad Mahmoodi
    Hossein Nezamabadi-pour
    Dariush Abbasi-Moghadam
    Multimedia Tools and Applications, 2022, 81 : 20945 - 20961
  • [35] Implementation and Application of Violence Detection System Based on Multi-head Attention and LSTM
    Cao, Fengping
    Miao, Yi
    Zhang, Wangyi
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, PT VII, ICIC 2024, 2024, 14868 : 77 - 88
  • [36] Semantic multimodal violence detection based on local-to-global embedding
    Pu, Yujiang
    Wu, Xiaoyu
    Wang, Shengjin
    Huang, Yuming
    Liu, Zihao
    Gu, Chaonan
    NEUROCOMPUTING, 2022, 514 : 148 - 161
  • [37] Balancing Accuracy and Training Time in Federated Learning for Violence Detection in Surveillance Videos: A Study of Neural Network Architectures
    Pajon, Quentin
    Serre, Swan
    Wissocq, Hugo
    Rabaud, Leo
    Haidar, Siba
    Yaacoub, Antoun
    JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY, 2024, 39 (05) : 1029 - 1039
  • [38] KianNet: A Violence Detection Model Using an Attention-Based CNN-LSTM Structure
    Vosta, Soheil
    Yow, Kin-Choong
    IEEE ACCESS, 2024, 12 : 2198 - 2209
  • [39] Mosaicking based optimal threshold image enhancement for violence detection with deep quadratic attention mechanism
    Elakiya, V.
    Aruna, P.
    Puviarasan, N.
    JOURNAL OF BIG DATA, 2024, 11 (01)
  • [40] Application of Sentiment Lexicons on Movies Transcripts to Detect Violence in Videos
    Alenzi, Badriya Murdhi
    Khan, Muhammad Badruddin
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2019, 10 (02) : 352 - 360