Audiovisual Dependency Attention for Violence Detection in Videos

被引：4

作者：

Pang, Wenfeng ^{[1
]}

Xie, Wei ^{[1
]}

He, Qianhua ^{[1
]}

Li, Yanxiong ^{[1
]}

Yang, Jichen ^{[2
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China

[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

基金：

中国国家自然科学基金;

关键词：

Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;

D O I：

10.1109/TMM.2022.3184533

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.

引用

页码：4922 / 4932

页数：11

共 50 条

[41] Investigating Multimodal Audiovisual Event Detection and Localization
Vryzas, N.
Kotsakis, R.
Dimoulas, C. A.
Kalliris, G.
PROCEEDINGS OF AUDIO MOSTLY 2016 - A CONFERENCE ON INTERACTION WITH SOUND IN COOPERATION WITH ACM, 2016, : 97 - 104
[42] On pedestrian detection and tracking in infrared videos
Wang, Jiang-tao
Chen, De-bao
Chen, Hai-yan
Yang, Jing-yu
PATTERN RECOGNITION LETTERS, 2012, 33 (06) : 775 - 785
[43] LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION
Wei, Dong-Lai
Liu, Chen-Geng
Liu, Yang
Liu, Jing
Zhu, Xiao-Guang
Zeng, Xin-Hua
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1980 - 1984
[44] Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection
Xiao, Yang
Gao, Guxue
Wang, Liejun
Lai, Huicheng
ENTROPY, 2022, 24 (07)
[45] Comparative Analysis: Violence Recognition from Videos using Computer Vision
Dashdamirov, Dursun
2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
[46] Lightweight Violence Detection Model Based on 2D CNN with Bi-Directional Motion Attention
Wang, Jingwen
Zhao, Daqi
Li, Haoming
Wang, Deqiang
APPLIED SCIENCES-BASEL, 2024, 14 (11):
[47] Machine Cognition of Violence in Videos using Novel Outlier-Resistant VLAD
Deb, Tonmoay
Arman, Aziz
Firoze, Adnan
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 989 - 994
[48] A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs
Mahmoodi, Javad
Nezamabadi-pour, Hossein
PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)
[49] Fast Violence Detection in Video
Deniz, Oscar
Serrano, Ismael
Bueno, Gloria
Kim, Tae-Kyun
PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 478 - 485
[50] Violence detection in compressed video
Honarjoo, Narges
Abdari, Ali
Mansouri, Azadeh
Multimedia Tools and Applications, 2024, 83 (29) : 73703 - 73716

← 1 2 3 4 5 →