Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition

被引:222
作者
Chen, Tailin [1 ,3 ,4 ]
Zhou, Desen [2 ]
Wang, Jian [2 ]
Wang, Shidong [1 ]
Guan, Yu [1 ]
He, Xuming [3 ]
Ding, Errui [2 ]
机构
[1] Newcastle Univ, Open Lab, Newcastle Upon Tyne, Tyne & Wear, England
[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing, Peoples R China
[3] ShanghaiTech Univ, Shanghai, Peoples R China
[4] Baidu VIS, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
英国工程与自然科学研究理事会;
关键词
Action Recognition; Skeleton-based; Multi-granular; Spatial temporal; attention; DualHead-Net;
D O I
10.1145/3474085.3475574
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion. Existing approaches typically employ a single neural representation for different motion patterns, which has difficulty in capturing fine-grained action classes given limited training data. To address the aforementioned problems, we propose a novel multi-granular spatiotemporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. To this end, we develop a dual-head graph network consisting of two interleaved branches, which enables us to extract features at two spatio-temporal resolutions in an effective and efficient manner. Moreover, our network utilises a cross-head communication strategy to mutually enhance the representations of both heads. We conducted extensive experiments on three large-scale datasets, namely NTU RGB+D 60, NTU RGB+D 120, and KineticsSkeleton, and achieves the state-of-the-art performance on all the benchmarks, which validates the effectiveness of our method1.
引用
收藏
页码:4334 / 4342
页数:9
相关论文
共 42 条
  • [1] [Anonymous], 2018, ARXIV180406055
  • [2] [Anonymous], 2015, PROC CVPR IEEE
  • [3] [Anonymous], 2016, P IEEE C COMPUTER VI, DOI DOI 10.1109/CVPR.2016.115
  • [4] [Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00119
  • [5] [Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00026
  • [6] SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition
    Caetano, Carlos
    Sena, Jessica
    Bremond, Francois
    dos Santos, Jefersson A.
    Schwartz, William Robson
    [J]. 2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
  • [7] Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints
    Caetano, Carlos
    Bremond, Francois
    Schwartz, William Robson
    [J]. 2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, : 16 - 23
  • [8] Canto LF, 2013, SCATTERING THEORY OF MOLECULES, ATOMS AND NUCLEI, P3
  • [9] OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields
    Cao, Zhe
    Hidalgo, Gines
    Simon, Tomas
    Wei, Shih-En
    Sheikh, Yaser
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (01) : 172 - 186
  • [10] Chen ZH, 2021, AAAI CONF ARTIF INTE, V35, P1132