Learning Attention-Enhanced Spatiotemporal Representation for Action Recognition

被引:11
|
作者
Shi, Zhensheng [1 ]
Cao, Liangjie [1 ]
Guan, Cheng [1 ]
Zheng, Haiyong [1 ]
Gu, Zhaorui [1 ]
Yu, Zhibin [1 ]
Zheng, Bing [1 ]
机构
[1] Ocean Univ China, Dept Elect Engn, Qingdao 266100, Peoples R China
来源
IEEE ACCESS | 2020年 / 8卷 / 08期
基金
中国国家自然科学基金;
关键词
Action recognition; video understanding; spatiotemporal representation; visual attention; 3D-CNN; residual learning;
D O I
10.1109/ACCESS.2020.2968024
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Learning spatiotemporal features via 3D-CNN (3D Convolutional Neural Network) models has been regarded as an effective approach for action recognition. In this paper, we explore visual attention mechanism for video analysis and propose a novel 3D-CNN model, dubbed AE-I3D (Attention-Enhanced Inflated-3D Network), for learning attention-enhanced spatiotemporal representation. The contribution of our AE-I3D is threefold: First, we inflate soft attention in spatiotemporal scope for 3D videos, and adopt softmax to generate probability distribution of attentional features in a feedforward 3D-CNN architecture; Second, we devise an AE-Res (Attention-Enhanced Residual learning) module, which learns attention-enhanced features in a two-branch residual learning way, also the AE-Res module is lightweight and flexible, so that can be easily embedded into many 3D-CNN architectures; Finally, we embed multiple AE-Res modules into an I3D (Inflated-3D) network, yielding our AE-I3D model, which can be trained in an end-to-end, video-level manner. Different from previous attention networks, our method inflates residual attention from 2D image to 3D video for 3D attention residual learning to enhance spatiotemporal representation. We use RGB-only video data for evaluation on three benchmarks: UCF101, HMDB51, and Kinetics. The experimental results demonstrate that our AE-I3D is effective with competitive performance.
引用
收藏
页码:16785 / 16794
页数:10
相关论文
共 50 条
  • [31] A Deep Learning Network for Action Recognition Incorporating Temporal Attention Mechanism
    Liu, Yue
    Zhang, Lei
    Xin, Shan
    Zhang, Yu
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND BIOMIMETICS (IEEE-ROBIO 2021), 2021, : 1576 - 1581
  • [32] Discriminative Relational Representation Learning for RGB-D Action Recognition
    Kong, Yu
    Fu, Yun
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (06) : 2856 - 2865
  • [33] Collaboratively Self-Supervised Video Representation Learning for Action Recognition
    Zhang, Jie
    Wan, Zhifan
    Hu, Lanqing
    Lin, Stephen
    Wu, Shuzhe
    Shan, Shiguang
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2025, 20 : 1895 - 1907
  • [34] Exploiting Spatiotemporal Features for Action Recognition
    Bin Muslim, Usairam
    Khan, Muhammad Hassan
    Farid, Muhammad Shahid
    PROCEEDINGS OF 2021 INTERNATIONAL BHURBAN CONFERENCE ON APPLIED SCIENCES AND TECHNOLOGIES (IBCAST), 2021, : 613 - 619
  • [35] Representation Learning of Temporal Dynamics for Skeleton-Based Action Recognition
    Du, Yong
    Fu, Yun
    Wang, Liang
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2016, 25 (07) : 3010 - 3022
  • [36] Deep Video Understanding: Representation Learning, Action Recognition, and Language Generation
    Mei, Tao
    PROCEEDINGS OF THE 1ST WORKSHOP AND CHALLENGE ON COMPREHENSIVE VIDEO UNDERSTANDING IN THE WILD (COVIEW'18), 2018, : 1 - 1
  • [37] Nonnegative Component Representation with Hierarchical Dictionary Learning Strategy for Action Recognition
    Wang, Jianhong
    Zhang, Pinzheng
    Luo, Linmin
    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2016, E99D (04): : 1259 - 1263
  • [38] Multi-Group Multi-Attention: Towards Discriminative Spatiotemporal Representation
    Shi, Zhensheng
    Cao, Liangjie
    Guan, Cheng
    Liang, Ju
    Li, Qianqian
    Gu, Zhaorui
    Zheng, Haiyong
    Zheng, Bing
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 2057 - 2066
  • [39] Spatiotemporal decoupling attention transformer for 3D skeleton-based driver action recognition
    Xu, Zhuoyan
    Xu, Jingke
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [40] Learning SpatioTemporal and Motion Features in a Unified 2D Network for Action Recognition
    Wang, Mengmeng
    Xing, Jiazheng
    Su, Jing
    Chen, Jun
    Liu, Yong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (03) : 3347 - 3362