Learning Multi-Granular Spatio-Temporal Graph Network for Skeleton-based Action Recognition

被引：252

作者：

Chen, Tailin ^{[1
,3
,4
]}

Zhou, Desen ^{[2
]}

Wang, Jian ^{[2
]}

Wang, Shidong ^{[1
]}

Guan, Yu ^{[1
]}

He, Xuming ^{[3
]}

Ding, Errui ^{[2
]}

机构：

[1] Newcastle Univ, Open Lab, Newcastle Upon Tyne, Tyne & Wear, England

[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing, Peoples R China

[3] ShanghaiTech Univ, Shanghai, Peoples R China

[4] Baidu VIS, Beijing, Peoples R China

来源：

PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021 | 2021年

基金：

英国工程与自然科学研究理事会;

关键词：

Action Recognition; Skeleton-based; Multi-granular; Spatial temporal; attention; DualHead-Net;

D O I：

10.1145/3474085.3475574

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The task of skeleton-based action recognition remains a core challenge in human-centred scene understanding due to the multiple granularities and large variation in human motion. Existing approaches typically employ a single neural representation for different motion patterns, which has difficulty in capturing fine-grained action classes given limited training data. To address the aforementioned problems, we propose a novel multi-granular spatiotemporal graph network for skeleton-based action classification that jointly models the coarse- and fine-grained skeleton motion patterns. To this end, we develop a dual-head graph network consisting of two interleaved branches, which enables us to extract features at two spatio-temporal resolutions in an effective and efficient manner. Moreover, our network utilises a cross-head communication strategy to mutually enhance the representations of both heads. We conducted extensive experiments on three large-scale datasets, namely NTU RGB+D 60, NTU RGB+D 120, and KineticsSkeleton, and achieves the state-of-the-art performance on all the benchmarks, which validates the effectiveness of our method1.

引用

页码：4334 / 4342

页数：9

共 41 条

[21]

Kipf T. N., 2017, INT C LEARN REPR ICL, DOI [10.1051/0004-6361/201527329, DOI 10.48550/ARXIV.1609.02907]

[22]

Li BY, 2019, AAAI CONF ARTIF INTE, P8577

[23]

Li C., 2018, arXiv.1804.06055

[24] Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [J].

Li, Maosen ;

Chen, Siheng ;

Chen, Xu ;

Zhang, Ya ;

Wang, Yanfeng ;

Tian, Qi .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3590-3598

[25]

Li W., 2017, ICCV

[26]

Liu J., 2019, TPAMI

[27] Global Context-Aware Attention LSTM Networks for 3D Action Recognition [J].

Liu, Jun ;

Wang, Gang ;

Hu, Ping ;

Duan, Ling-Yu ;

Kot, Alex C. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3671-3680

[28] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition [J].

Liu, Jun ;

Shahroudy, Amir ;

Xu, Dong ;

Wang, Gang .

COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :816-833

[29] Enhanced skeleton visualization for view invariant human action recognition [J].

Liu, Mengyuan ;

Liu, Hong ;

Chen, Chen .

PATTERN RECOGNITION, 2017, 68 :346-362

[30] Open Compound Domain Adaptation [J].

Liu, Ziwei ;

Miao, Zhongqi ;

Pan, Xingang ;

Zhan, Xiaohang ;

Lin, Dahua ;

Yu, Stella X. ;

Gong, Boqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12403-12412

← 1 2 3 4 5 →