Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction

被引:141
作者
Li, Maosen [1 ,2 ]
Chen, Siheng [1 ,2 ]
Chen, Xu [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Tian, Qi [3 ,4 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai Key Lab Multimedia Proc & Transmiss, Shanghai, Peoples R China
[3] Huawei Cloud & Al, Shenzhen 518129, Peoples R China
[4] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
关键词
Feature extraction; Three-dimensional displays; Magnetic heads; Joints; Convolution; Task analysis; Symbiosis; 3D skeleton-based action recognition; motion prediction; multiscale graph convolution networks; graph inference;
D O I
10.1109/TPAMI.2021.3053765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; and 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multiscale graph convolution networks to extract spatial and temporal features. The multiscale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.
引用
收藏
页码:3316 / 3333
页数:18
相关论文
共 65 条
[51]   LAND COVER CLASSIFICATION FOR SATELLITE IMAGES THROUGH 1D CNN [J].
Song, Yang ;
Zhang, Zhifei ;
Baghbaderani, Razieh Kaviani ;
Wang, Fanqi ;
Qu, Ying ;
Stutts, Craig ;
Qi, Hairong .
2019 10TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING - EVOLUTION IN REMOTE SENSING (WHISPERS), 2019,
[52]   Deep Progressive Reinforcement Learning for Skeleton-based Action Recognition [J].
Tang, Yansong ;
Tian, Yi ;
Lu, Jiwen ;
Li, Peiyang ;
Zhou, Jie .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5323-5332
[53]  
Thakkar K., 2018, BRIT MACH VIS C BMVC
[54]   Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data [J].
Vemulapalli, Raviteja ;
Chellappa, Rama .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4471-4479
[55]   Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group [J].
Vemulapalli, Raviteja ;
Arrate, Felipe ;
Chellappa, Rama .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :588-595
[56]   FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis [J].
Verma, Nitika ;
Boyer, Edmond ;
Verbeek, Jakob .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2598-2606
[57]   The Pose Knows: Video Forecasting by Generating Pose Futures [J].
Walker, Jacob ;
Marino, Kenneth ;
Gupta, Abhinav ;
Hebert, Martial .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3352-3361
[58]   Dividing and Aggregating Network for Multi-view Action Recognition [J].
Wang, Dongang ;
Ouyang, Wanli ;
Li, Wen ;
Xu, Dong .
COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :457-473
[59]   Mining Actionlet Ensemble for Action Recognition with Depth Cameras [J].
Wang, Jiang ;
Liu, Zicheng ;
Wu, Ying ;
Yuan, Junsong .
2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, :1290-1297
[60]  
Wen YH, 2019, AAAI CONF ARTIF INTE, P8989