Beyond coordinate attention: spatial-temporal recalibration and channel scaling for skeleton-based action recognition

被引:2
|
作者
Tang, Jun [1 ,2 ]
Gong, Sihang [1 ]
Wang, Yanjiang [1 ]
Liu, Baodi [1 ]
Du, Chunyu [1 ]
Gu, Boyang [1 ]
机构
[1] China Univ Petr East China, Coll Control Sci & Engn, Qingdao 266580, Peoples R China
[2] Qingdao Agr Univ, Coll Animat & Commun, Qingdao 266109, Peoples R China
基金
中国国家自然科学基金;
关键词
Lightweight attention mechanism; Long-range dependency; Graph convolutional network; Skeleton-based action recognition; Object detection; Semantic segmentation;
D O I
10.1007/s11760-023-02747-0
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Skeleton-based action recognition is an attractive issue in computer vision. Recent lightweight attention mechanisms (e.g. coordinate attention) have proven to be highly effective in skeleton-based action recognition. However, since long-range dependencies are captured along spatial and temporal directions, respectively, coordination attention cannot capture accurate long-range dependencies in the entire spatio-temporal domain and inevitably leads to inaccurate spatio-temporal location. In this work, we propose an efficient and lightweight attention mechanism, called coordinate enhanced attention, which consists of spatial-temporal recalibration and channel scaling. Spatial-temporal recalibration aims to capture precise long-range dependencies directly in the entire spatial-temporal domain. And channel scaling is introduced to efficiently utilize the multi-channel weight information. Our coordinate enhanced attention is efficient and lightweight, which can be easily integrated into classical neural networks. On two large-size datasets for skeleton-based action recognition (i.e. NTU RGB+D 60 and NTU RGB+D 120), our coordinate enhanced attention obtains consistent improvements. Experiments on two popular object detection datasets (i.e. COCO and Pascal VOC) and semantic segmentation dataset (i.e. Cityscapes) indicate that the proposed coordinate enhanced attention outperforms other lightweight attention mechanisms, which further validates its transferable ability.
引用
收藏
页码:199 / 206
页数:8
相关论文
共 50 条
  • [21] Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition
    Fang, Zheng
    Zhang, Xiongwei
    Cao, Tieyong
    Zheng, Yunfei
    Sun, Meng
    IET COMPUTER VISION, 2022, 16 (03) : 205 - 217
  • [22] Spatial-Temporal Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition
    Hang, Rui
    Li, MinXian
    COMPUTER VISION - ACCV 2022, PT IV, 2023, 13844 : 172 - 188
  • [23] Spatial-temporal graph neural ODE networks for skeleton-based action recognition
    Longji Pan
    Jianguang Lu
    Xianghong Tang
    Scientific Reports, 14
  • [24] Advanced skeleton-based action recognition via spatial-temporal rotation descriptors
    Shen, Zhongwei
    Wu, Xiao-Jun
    Kittler, Josef
    PATTERN ANALYSIS AND APPLICATIONS, 2021, 24 (03) : 1335 - 1346
  • [25] Dynamic spatial-temporal topology graph network for skeleton-based action recognition
    Chen, Lian
    Lu, Ke
    Niu, Zehai
    Wei, Runchen
    Xue, Jian
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [26] Multilevel Spatial-Temporal Excited Graph Network for Skeleton-Based Action Recognition
    Zhu, Yisheng
    Shuai, Hui
    Liu, Guangcan
    Liu, Qingshan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 496 - 508
  • [27] Spatial–Temporal gated graph attention network for skeleton-based action recognition
    Mrugendrasinh Rahevar
    Amit Ganatra
    Pattern Analysis and Applications, 2023, 26 (3) : 929 - 939
  • [28] Multi-stream adaptive spatial-temporal attention graph convolutional network for skeleton-based action recognition
    Yu, Lubin
    Tian, Lianfang
    Du, Qiliang
    Bhutto, Jameel Ahmed
    IET COMPUTER VISION, 2022, 16 (02) : 143 - 158
  • [29] Hierarchical Spatial-Temporal Network for Skeleton-Based Temporal Action Segmentation
    Tan, Chenwei
    Sun, Tao
    Fu, Talas
    Wang, Yuhan
    Xu, Minjie
    Liu, Shenglan
    PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT X, 2024, 14434 : 28 - 39
  • [30] Actionmamba: Action Spatial-Temporal Aggregation Network Based on Mamba and Gcn for Skeleton-Based Action Recognition
    North University of China, School of Electrical and Control Engineering, Shanxi, Taiyuan
    030051, China