Multi-level channel attention excitation network for human action recognition in videos

被引:4
|
作者
Wu, Hanbo [1 ]
Ma, Xin [1 ]
Li, Yibin [1 ]
机构
[1] Shandong Univ, Ctr Robot, Sch Control Sci & Engn, Jinan, Peoples R China
基金
中国国家自然科学基金;
关键词
Human action recognition; 2D CNNs; Channel attention; Spatiotemporal modeling;
D O I
10.1016/j.image.2023.116940
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Channel attention mechanism has continuously attracted strong interests and shown great potential in enhancing the performance of deep CNNs. However, when applied to video-based human action recognition task, most existing methods generally learn channel attention at frame level, which ignores the temporal dependencies and may limit the recognition performance. In this paper, we propose a novel multi-level channel attention excitation (MCAE) module to model the temporal-related channel attention at both frame and video levels. Specifically, based on video convolutional feature maps, frame-level channel attention (FCA) is generated by exploring time-channel correlations, and video-level channel attention (VCA) is generated by aggregating global motion variations. MCAE firstly recalibrates video feature responses with frame-wise FCA, and then activates the motion-sensitive channel features with motion-aware VCA. MCAE module learns the channel discriminability from multiple levels and can act as a guidance to facilitate efficient spatiotemporal feature modeling in activated motion-sensitive channels. It can be flexibly embedded into 2D networks with very limited extra computation cost to construct MCAE-Net, which effectively enhances the spatiotemporal representation of 2D models for video action recognition task Extensive experiments on five human action datasets show that our method achieves superior or very competitive performance compared with the state -of-the-arts, which demonstrates the effectiveness of the proposed method for improving the performance of human action recognition.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Action Recognition Based on Multi-Level Topological Channel Attention of Human Skeleton
    Hu, Kai
    Shen, Chaowen
    Wang, Tianyan
    Shen, Shuai
    Cai, Chengxue
    Huang, Huaming
    Xia, Min
    SENSORS, 2023, 23 (24)
  • [2] Speech Emotion Recognition via Multi-Level Attention Network
    Liu, Ke
    Wang, Dekui
    Wu, Dongya
    Liu, Yutao
    Feng, Jun
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 2278 - 2282
  • [3] Multi-level Sparse Coding for Human Action Recognition
    Luo, Huiwu
    Lu, Huanzhang
    2016 8TH INTERNATIONAL CONFERENCE ON INTELLIGENT HUMAN-MACHINE SYSTEMS AND CYBERNETICS (IHMSC), VOL. 1, 2016, : 460 - 463
  • [4] MLENet: Multi-Level Extraction Network for video action recognition
    Wang, Fan
    Li, Xinke
    Xiong, Han
    Mo, Haofan
    Li, Yongming
    PATTERN RECOGNITION, 2024, 154
  • [5] Human Action Recognition Based On Multi-level Feature Fusion
    Xu, Y. Y.
    Xiao, G. Q.
    Tang, X. Q.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTER INFORMATION SYSTEMS AND INDUSTRIAL APPLICATIONS (CISIA 2015), 2015, 18 : 353 - 355
  • [6] MLAN: Multi-Level Attention Network
    Qin, Peinuan
    Wang, Qinxuan
    Zhang, Yue
    Wei, Xueyao
    Gao, Meiguo
    IEEE ACCESS, 2022, 10 : 105437 - 105446
  • [7] AttnSense: Multi-level Attention Mechanism For Multimodal Human Activity Recognition
    Ma, Haojie
    Li, Wenzhong
    Zhang, Xiao
    Gao, Songcheng
    Lu, Sanglu
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 3109 - 3115
  • [8] Two-Stream Convolutional Network with Multi-level Feature Fusion for Categorization of Human Action from Videos
    Bhattacharjee, Prateep
    Das, Sukhendu
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2017, 2017, 10597 : 549 - 556
  • [9] MAFN: multi-level attention fusion network for multimodal named entity recognition
    Zhou, Xiaoying
    Zhang, Yijia
    Wang, Zhuang
    Lu, Mingyu
    Liu, Xiaoxia
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (15) : 45047 - 45058
  • [10] MAFN: multi-level attention fusion network for multimodal named entity recognition
    Xiaoying Zhou
    Yijia Zhang
    Zhuang Wang
    Mingyu Lu
    Xiaoxia Liu
    Multimedia Tools and Applications, 2024, 83 : 45047 - 45058