Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition

被引:252
作者
Chen, Tailin [1 ,3 ,4 ]
Zhou, Desen [2 ]
Wang, Jian [2 ]
Wang, Shidong [1 ]
Guan, Yu [1 ]
He, Xuming [3 ]
Ding, Errui [2 ]
机构
[1] Newcastle Univ, Open Lab, Newcastle Upon Tyne, Tyne & Wear, England
[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing, Peoples R China
[3] ShanghaiTech Univ, Shanghai, Peoples R China
[4] Baidu VIS, Beijing, Peoples R China
来源
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年
基金
英国工程与自然科学研究理事会;
关键词
Action Recognition; Skeleton-based; Multi-granular; Spatial temporal; attention; DualHead-Net;
D O I
10.1145/3474085.3475574
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion. Existing approaches typically employ a single neural representation for different motion patterns, which has difficulty in capturing fine-grained action classes given limited training data. To address the aforementioned problems, we propose a novel multi-granular spatiotemporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. To this end, we develop a dual-head graph network consisting of two interleaved branches, which enables us to extract features at two spatio-temporal resolutions in an effective and efficient manner. Moreover, our network utilises a cross-head communication strategy to mutually enhance the representations of both heads. We conducted extensive experiments on three large-scale datasets, namely NTU RGB+D 60, NTU RGB+D 120, and KineticsSkeleton, and achieves the state-of-the-art performance on all the benchmarks, which validates the effectiveness of our method1.
引用
收藏
页码:4334 / 4342
页数:9
相关论文
共 41 条
[21]  
Kipf T. N., 2017, INT C LEARN REPR ICL, DOI [10.1051/0004-6361/201527329, DOI 10.48550/ARXIV.1609.02907]
[22]  
Li BY, 2019, AAAI CONF ARTIF INTE, P8577
[23]  
Li C., 2018, arXiv.1804.06055
[24]   Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [J].
Li, Maosen ;
Chen, Siheng ;
Chen, Xu ;
Zhang, Ya ;
Wang, Yanfeng ;
Tian, Qi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3590-3598
[25]  
Li W., 2017, ICCV
[26]  
Liu J., 2019, TPAMI
[27]   Global Context-Aware Attention LSTM Networks for 3D Action Recognition [J].
Liu, Jun ;
Wang, Gang ;
Hu, Ping ;
Duan, Ling-Yu ;
Kot, Alex C. .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3671-3680
[28]   Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition [J].
Liu, Jun ;
Shahroudy, Amir ;
Xu, Dong ;
Wang, Gang .
COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :816-833
[29]   Enhanced skeleton visualization for view invariant human action recognition [J].
Liu, Mengyuan ;
Liu, Hong ;
Chen, Chen .
PATTERN RECOGNITION, 2017, 68 :346-362
[30]   Open Compound Domain Adaptation [J].
Liu, Ziwei ;
Miao, Zhongqi ;
Pan, Xingang ;
Zhan, Xiaohang ;
Lin, Dahua ;
Yu, Stella X. ;
Gong, Boqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12403-12412