Audiovisual Dependency Attention for Violence Detection in Videos

被引：4

作者：

Pang, Wenfeng ^{[1
]}

Xie, Wei ^{[1
]}

He, Qianhua ^{[1
]}

Li, Yanxiong ^{[1
]}

Yang, Jichen ^{[2
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China

[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

中国国家自然科学基金;

关键词：

Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;

D O I：

10.1109/TMM.2022.3184533

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.

引用

页码：4922 / 4932

页数：11

共 50 条

[21] FTCF: Full temporal cross fusion network for violence detection in videos
Tan Zhenhua
Xia Zhenche
Wang Pengfei
Ding Chang
Zhai Weichao
Applied Intelligence, 2023, 53 : 4218 - 4230
[22] Transformer and Adaptive Threshold Sliding Window for Improving Violence Detection in Videos
Rendon-Segador, Fernando J.
Alvarez-Garcia, Juan A.
Soria-Morillo, Luis M.
SENSORS, 2024, 24 (16)
[23] Trajectory-Pooled Deep Convolutional Networks for Violence Detection in Videos
Meng, Zihan
Yuan, Jiabin
Li, Zhen
COMPUTER VISION SYSTEMS, ICVS 2017, 2017, 10528 : 437 - 447
[24] An intensive survey on violence detection from videos using computer vision techniques
Shrish, R.
Munusamy, Hemalatha
Aravindh, K.
Tennyson, T. Samuel
ENGINEERING COMPUTATIONS, 2025, 42 (03) : 1139 - 1162
[25] Real time violence detection in surveillance videos using Convolutional Neural Networks
Irfanullah
Hussain, Tariq
Iqbal, Arshad
Yang, Bailin
Hussain, Altaf
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (26) : 38151 - 38173
[26] Multi-stream Deep Networks for Person to Person Violence Detection in Videos
Dong, Zhihong
Qin, Jie
Wang, Yunhong
PATTERN RECOGNITION (CCPR 2016), PT I, 2016, 662 : 517 - 531
[27] An ensemble based approach for violence detection in videos using deep transfer learning
Gurmeet Kaur
Sarbjeet Singh
Multimedia Tools and Applications, 2025, 84 (12) : 11001 - 11025
[28] Violence Detection in Videos based on CNN feature for ConvLSTM2D
Trinh, Tan Dat
Sang, Vu Ngoc Thanh
Thuy, Le Nhi Lam
Le, Duy-Dong
Nguyen, Thai-Binh
Bao, Pham The
PROCEEDINGS OF THE 5TH ACM WORKSHOP ON INTELLIGENT CROSS-DATA ANALYSIS AND RETRIEVAL, ICDAR 2024, 2024, : 33 - 36
[29] A deep learning-assisted visual attention mechanism for anomaly detection in videos
Shoaib, Muhammad
Shah, Babar
Hussain, Tariq
Yang, Bailin
Ullah, Asad
Khan, Jahangir
Ali, Farman
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (29) : 73363 - 73390
[30] Violence detection in videos for an intelligent surveillance system using MoBSIFT and movement filtering algorithm
I. P. Febin
K. Jayasree
Preetha Theresa Joy
Pattern Analysis and Applications, 2020, 23 : 611 - 623

← 1 2 3 4 5 →