Dynamic Semantic-Based Spatial-Temporal Graph Convolution Network for Skeleton-Based Human Action Recognition

被引:0
作者
Xie, Jianyang [1 ]
Meng, Yanda [2 ]
Zhao, Yitian [3 ]
Nguyen, Anh [4 ]
Yang, Xiaoyun [5 ]
Zheng, Yalin [2 ]
机构
[1] Univ Liverpool, Sch EEECS, CDT Distributed Algorithms, Liverpool L7 8TS, England
[2] Univ Liverpool, Dept Eye & Vis Sci, Liverpool L7 8TS, England
[3] Ningbo Inst Ind Technol, Ningbo 312501, Peoples R China
[4] Univ Liverpool, Dept Comp Sci, Liverpool L7 8TS, England
[5] Remark Holdings, Las Vegas, NV 89106 USA
基金
英国工程与自然科学研究理事会;
关键词
Semantics; Skeleton; Convolution; Human activity recognition; Graph neural networks; Encoding; Adaptation models; Legged locomotion; Joints; Image coding; Human action recognition; skeleton-based; semantics encoding; joints/edge type; frames occurrence order; graph convolution network;
D O I
10.1109/TIP.2024.3497837
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human action recognition is an essential topic in computer vision and image processing. Graph convolutional networks (GCNs) have attracted significant attention and achieved noteworthy performance in skeleton-based human action recognition tasks. However, most of the previous graph-based works are designed to refine skeleton topology without considering the types of different joints and edges and the occurrence order of the frames. Such a limitation makes them insufficient to represent intrinsic semantic information. Differently, we proposed a dynamic semantic-based spatial-temporal graph convolution network (DS-STGCN) to address the challenge. DS-STGCN has two dynamic semantic modules for spatial and temporal contexts respectively. Specifically, the joints and edge types were encoded in the spatial module implicitly, and the occurrence order of frames was encoded in the temporal module implicitly. Extensive experiments on four datasets including NTU-RGB+D 60(120), Kinetics-400, and FineGYM show that our proposed two semantic modules can bring consistent recognition performance improvement with various backbones. Meanwhile, the proposed DS-STGCN notably surpassed state-of-the-art methods on these datasets. Notably, in the more challenging dataset, such as Kinetics-400, our model significantly outperformed other state-of-the-art GCN-based methods by a large margin. The code has been released at https://github.com/davelailai/DS-STGCN.
引用
收藏
页码:6691 / 6704
页数:14
相关论文
共 59 条
  • [1] STAR-Transformer: A Spatio-temporal Cross Attention Transformer for Human Action Recognition
    Ahn, Dasom
    Kim, Sangwon
    Hong, Hyunsu
    Ko, Byoung Chul
    [J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 3319 - 3328
  • [2] Action Recognition with Dynamic Image Networks
    Bilen, Hakan
    Fernando, Basura
    Gavves, Efstratios
    Vedaldi, Andrea
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (12) : 2799 - 2813
  • [3] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
    Caetano, Carlos
    Sena, Jessica
    Bremond, Francois
    dos Santos, Jefersson A.
    Schwartz, William Robson
    [J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [4] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
    Cao, Zhe
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
  • [5] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
    Carreira, Joao
    Zisserman, Andrew
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 4724 - 4733
  • [6] Channel-wise Topology Refinement Graph Convolution for Skeleton-Based Action Recognition
    Chen, Yuxin
    Zhang, Ziqi
    Yuan, Chunfeng
    Li, Bing
    Deng, Ying
    Hu, Weiming
    [J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 13339 - 13348
  • [7] Skeleton-Based Action Recognition with Shift Graph Convolutional Network
    Cheng, Ke
    Zhang, Yifan
    He, Xiangyu
    Chen, Weihan
    Cheng, Jian
    Lu, Hanqing
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 180 - 189
  • [8] InfoGCN: Representation Learning for Human Skeleton-based Action Recognition
    Chi, Hyung-gun
    Ha, Myoung Hoon
    Chi, Seunggeun
    Lee, Sang Wan
    Huang, Qixing
    Ramani, Karthik
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 20154 - 20164
  • [9] Das S, 2020, ECCV, P72, DOI DOI 10.1007/978-3-030-58545-7_5
  • [10] Learning Spatiotemporal Features with 3D Convolutional Networks
    Du Tran
    Bourdev, Lubomir
    Fergus, Rob
    Torresani, Lorenzo
    Paluri, Manohar
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 4489 - 4497