Revisiting Skeleton-based Action Recognition

被引:375
作者
Duan, Haodong [1 ,3 ]
Zhao, Yue [2 ]
Chen, Kai [3 ,5 ]
Lin, Dahua [1 ,3 ]
Dai, Bo [3 ,4 ]
机构
[1] Chinese Univ HongKong, Hong Kong, Peoples R China
[2] Univ Texas Austin, Austin, TX 78712 USA
[3] Shanghai AI Lab, Shanghai, Peoples R China
[4] Nanyang Technol Univ, S Lab, Singapore, Singapore
[5] SenseTime Res, Shenzhen, Peoples R China
来源
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022) | 2022年
关键词
D O I
10.1109/CVPR52688.2022.00298
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human skeleton, as a compact representation of human action, has received increasing attention in recent years. Many skeleton-based action recognition methods adopt GCNs to extract features on top of human skeletons. Despite the positive results shown in these attempts, GCN-based methods are subject to limitations in robustness, interoperability, and scalability. In this work, we propose PoseConv3D, a new approach to skeleton-based action recognition. PoseConv3D relies on a 3D heatmap volume instead of a graph sequence as the base representation of human skeletons. Compared to GCN-based methods, PoseConv3D is more effective in learning spatiotemporal features, more robust against pose estimation noises, and generalizes better in cross-dataset settings. Also, PoseConv3D can handle multiple-person scenarios without additional computation costs. The hierarchical features can be easily integrated with other modalities at early fusion stages, providing a great design space to boost the performance. PoseConv3D achieves the state-of-the-art on five of six standard skeleton-based action recognition benchmarks. Once fused with other modalities, it achieves the state-of-the-art on all eight multi-modality action recognition benchmarks. Code has been made available at: https://github.com/kennymckormick/pyskl.
引用
收藏
页码:2959 / 2968
页数:10
相关论文
共 66 条
[41]  
Liu Ziyu, 2020, CVPR
[42]   2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning [J].
Luvizon, Diogo C. ;
Picard, David ;
Tabia, Hedi .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5137-5146
[43]   Stacked Hourglass Networks for Human Pose Estimation [J].
Newell, Alejandro ;
Yang, Kaiyu ;
Deng, Jia .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :483-499
[44]  
Ren Shaoqing, 2015, ARXIV150601497
[45]  
Sendo Kohei, 2019, ICMVA
[46]  
Shi Lei, 2019, P IEEECVF C COMPUTER, P7912
[47]  
Shi Lei, 2020, ARXIV200703263
[48]  
Shi Lei, 2019, P IEEECVF C COMPUTER, P12026
[49]  
Simonyan K, 2014, ADV NEUR IN, V27
[50]  
Song Y.-F., 2021, ARXIV210615125