Local-aware spatio-temporal attention network with multi-stage feature fusion for human action recognition

被引:0
|
作者
Yaqing Hou
Hua Yu
Dongsheng Zhou
Pengfei Wang
Hongwei Ge
Jianxin Zhang
Qiang Zhang
机构
[1] Dalian University of Technology,School of Computer Science and Technology
[2] Dalian University,School of Software Engineering
[3] Dalian Minzu University,School of Computer Science and Engineering
来源
Neural Computing and Applications | 2021年 / 33卷
关键词
Spatio-temporal attention networks; Spatial transformer network; Feature fusion; Human action recognition;
D O I
暂无
中图分类号
学科分类号
摘要
In the study of human action recognition, two-stream networks have made excellent progress recently. However, there remain challenges in distinguishing similar human actions in videos. This paper proposes a novel local-aware spatio-temporal attention network with multi-stage feature fusion based on compact bilinear pooling for human action recognition. To elaborate, taking two-stream networks as our essential backbones, the spatial network first employs multiple spatial transformer networks in a parallel manner to locate the discriminative regions related to human actions. Then, we perform feature fusion between the local and global features to enhance the human action representation. Furthermore, the output of the spatial network and the temporal information are fused at a particular layer to learn the pixel-wise correspondences. After that, we bring together three outputs to generate the global descriptors of human actions. To verify the efficacy of the proposed approach, comparison experiments are conducted with the traditional hand-engineered IDT algorithms, the classical machine learning methods (i.e., SVM) and the state-of-the-art deep learning methods (i.e., spatio-temporal multiplier networks). According to the results, our approach is reported to obtain the best performance among existing works, with the accuracy of 95.3% and 72.9% on UCF101 and HMDB51, respectively. The experimental results thus demonstrate the superiority and significance of the proposed architecture in solving the task of human action recognition.
引用
收藏
页码:16439 / 16450
页数:11
相关论文
共 50 条
  • [31] A multi-stage spatio-temporal adaptive network for video super-resolution
    Zhang, Yuhang
    Chen, Zhenzhong
    Liu, Shan
    JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2022, 87
  • [32] Spatio-temporal SRU with global context-aware attention for 3D human action recognition
    She, Qingshan
    Mu, Gaoyuan
    Gan, Haitao
    Fan, Yingle
    MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (17-18) : 12349 - 12371
  • [33] Spatio-temporal SRU with global context-aware attention for 3D human action recognition
    Qingshan She
    Gaoyuan Mu
    Haitao Gan
    Yingle Fan
    Multimedia Tools and Applications, 2020, 79 : 12349 - 12371
  • [34] A Dual Pipeline With Spatio-Temporal Attention Fusion Approach for Human Activity Recognition
    Wang, Xiaodong
    Li, Ying
    Fang, Aiqing
    He, Pei
    Guo, Yangming
    IEEE SENSORS JOURNAL, 2024, 24 (15) : 25150 - 25162
  • [35] Dual Stream Spatio-Temporal Motion Fusion With Self-Attention For Action Recognition
    Jalal, Md Asif
    Aftab, Waqas
    Moore, Roger K.
    Mihaylova, Lyudmila
    2019 22ND INTERNATIONAL CONFERENCE ON INFORMATION FUSION (FUSION 2019), 2019,
  • [36] A fast human action recognition network based on spatio-temporal features
    Xu, Jie
    Song, Rui
    Wei, Haoliang
    Guo, Jinhong
    Zhou, Yifei
    Huang, Xiwei
    NEUROCOMPUTING, 2021, 441 : 350 - 358
  • [37] Weakly Supervised Temporal Action Localization by Multi-Stage Fusion Network
    Shen, Zhengyang
    Wang, Feng
    Dai, Jin
    IEEE ACCESS, 2020, 8 : 17287 - 17298
  • [38] A fast human action recognition network based on spatio-temporal features
    Xu, Jie
    Song, Rui
    Wei, Haoliang
    Guo, Jinhong
    Zhou, Yifei
    Huang, Xiwei
    Neurocomputing, 2021, 441 : 350 - 358
  • [39] Human Action Recognition Algorithm Based on Spatio-Temporal Interactive Attention Model
    Pan Na
    Jiang Min
    Kong Jun
    LASER & OPTOELECTRONICS PROGRESS, 2020, 57 (18)
  • [40] Spatio-temporal information for human action recognition
    Yao, Li
    Liu, Yunjian
    Huang, Shihui
    EURASIP JOURNAL ON IMAGE AND VIDEO PROCESSING, 2016,