Audiovisual Dependency Attention for Violence Detection in Videos

被引:4
|
作者
Pang, Wenfeng [1 ]
Xie, Wei [1 ]
He, Qianhua [1 ]
Li, Yanxiong [1 ]
Yang, Jichen [2 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou 510640, Peoples R China
[2] Guangdong Polytech Normal Univ, Sch Cyberspace Secur, Speech Informat Secur Lab, Guangzhou 510640, Peoples R China
基金
中国国家自然科学基金;
关键词
Audiovisual dependency attention; dependency map; violence detection; SCENES; MOVIES; FUSION;
D O I
10.1109/TMM.2022.3184533
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Violence detection in videos can help maintain public order, detect crimes, or provide timely assistance. In this paper, we aim to leverage multimodal information to determine whether successive frames contain violence. Specifically, we propose an audiovisual dependency attention (AVD-attention) module modified from the co-attention architecture to fuse visual and audio information, unlike commonly used methods such as the feature concatenation, addition, and score fusion. Because the AVD-attention module's dependency map contains sufficient fusion information, we argue that it should be applied more sufficiently. A combination pooling method is utilized to convert the dependency map to an attention vector, which can be considered a new feature that includes fusion information or a mask of the attention feature map. Since some information in the input feature might be lost after processing by attention modules, we employ a multimodal low-rank bilinear method that considers all pairwise interactions among two features in each time step to complement the original information for output features of the module. AVD-attention outperformed co-attention in experiments on the XD-Violence dataset. Our system outperforms state-of-the-art systems.
引用
收藏
页码:4922 / 4932
页数:11
相关论文
共 50 条
  • [41] Investigating Multimodal Audiovisual Event Detection and Localization
    Vryzas, N.
    Kotsakis, R.
    Dimoulas, C. A.
    Kalliris, G.
    PROCEEDINGS OF AUDIO MOSTLY 2016 - A CONFERENCE ON INTERACTION WITH SOUND IN COOPERATION WITH ACM, 2016, : 97 - 104
  • [42] On pedestrian detection and tracking in infrared videos
    Wang, Jiang-tao
    Chen, De-bao
    Chen, Hai-yan
    Yang, Jing-yu
    PATTERN RECOGNITION LETTERS, 2012, 33 (06) : 775 - 785
  • [43] LOOK, LISTEN AND PAY MORE ATTENTION: FUSING MULTI-MODAL INFORMATION FOR VIDEO VIOLENCE DETECTION
    Wei, Dong-Lai
    Liu, Chen-Geng
    Liu, Yang
    Liu, Jing
    Zhu, Xiao-Guang
    Zeng, Xin-Hua
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 1980 - 1984
  • [44] Optical Flow-Aware-Based Multi-Modal Fusion Network for Violence Detection
    Xiao, Yang
    Gao, Guxue
    Wang, Liejun
    Lai, Huicheng
    ENTROPY, 2022, 24 (07)
  • [45] Comparative Analysis: Violence Recognition from Videos using Computer Vision
    Dashdamirov, Dursun
    2024 IEEE 18TH INTERNATIONAL CONFERENCE ON APPLICATION OF INFORMATION AND COMMUNICATION TECHNOLOGIES, AICT 2024, 2024,
  • [46] Lightweight Violence Detection Model Based on 2D CNN with Bi-Directional Motion Attention
    Wang, Jingwen
    Zhao, Daqi
    Li, Haoming
    Wang, Deqiang
    APPLIED SCIENCES-BASEL, 2024, 14 (11):
  • [47] Machine Cognition of Violence in Videos using Novel Outlier-Resistant VLAD
    Deb, Tonmoay
    Arman, Aziz
    Firoze, Adnan
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 989 - 994
  • [48] A spatio-temporal model for violence detection based on spatial and temporal attention modules and 2D CNNs
    Mahmoodi, Javad
    Nezamabadi-pour, Hossein
    PATTERN ANALYSIS AND APPLICATIONS, 2024, 27 (02)
  • [49] Fast Violence Detection in Video
    Deniz, Oscar
    Serrano, Ismael
    Bueno, Gloria
    Kim, Tae-Kyun
    PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, 2014, : 478 - 485
  • [50] Violence detection in compressed video
    Honarjoo, Narges
    Abdari, Ali
    Mansouri, Azadeh
    Multimedia Tools and Applications, 2024, 83 (29) : 73703 - 73716