Revisiting Skeleton-based Action Recognition

被引：375

作者：

Duan, Haodong ^{[1
,3
]}

Zhao, Yue ^{[2
]}

Chen, Kai ^{[3
,5
]}

Lin, Dahua ^{[1
,3
]}

Dai, Bo ^{[3
,4
]}

机构：

[1] Chinese Univ HongKong, Hong Kong, Peoples R China

[2] Univ Texas Austin, Austin, TX 78712 USA

[3] Shanghai AI Lab, Shanghai, Peoples R China

[4] Nanyang Technol Univ, S Lab, Singapore, Singapore

[5] SenseTime Res, Shenzhen, Peoples R China

来源：

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年

关键词：

D O I：

10.1109/CVPR52688.2022.00298

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt GCNs to extract features on top of human skeletons. Despite the positive results shown in these attempts, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. PoseConv3D relies on a 3D heatmap volume instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseConv3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseConv3D can handle multiple-person scenarios without additional computation costs. The hierarchical features can be easily integrated with other modalities at early fusion stages, providing a great design space to boost the performance. PoseConv3D achieves the state-of-the-art on five of six standard skeleton-based action recognition benchmarks. Once fused with other modalities, it achieves the state-of-the-art on all eight multi-modality action recognition benchmarks. Code has been made available at: https://github.com/kennymckormick/pyskl.

引用

页码：2959 / 2968

页数：10

共 66 条

[41]

Liu Ziyu, 2020, CVPR

[42] 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning [J].

Luvizon, Diogo C. ;

Picard, David ;

Tabia, Hedi .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5137-5146

[43] Stacked Hourglass Networks for Human Pose Estimation [J].

Newell, Alejandro ;

Yang, Kaiyu ;

Deng, Jia .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :483-499

[44]

Ren Shaoqing, 2015, ARXIV150601497

[45]

Sendo Kohei, 2019, ICMVA

[46]

Shi Lei, 2019, P IEEECVF C COMPUTER, P7912

[47]

Shi Lei, 2020, ARXIV200703263

[48]

Shi Lei, 2019, P IEEECVF C COMPUTER, P12026

[49]

Simonyan K, 2014, ADV NEUR IN, V27

[50]

Song Y.-F., 2021, ARXIV210615125

← 1 2 3 4 5 6 7 →