Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction

被引:141
作者
Li, Maosen [1 ,2 ]
Chen, Siheng [1 ,2 ]
Chen, Xu [1 ,2 ]
Zhang, Ya [1 ,2 ]
Wang, Yanfeng [1 ,2 ]
Tian, Qi [3 ,4 ]
机构
[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China
[2] Shanghai Jiao Tong Univ, Shanghai Key Lab Multimedia Proc & Transmiss, Shanghai, Peoples R China
[3] Huawei Cloud & Al, Shenzhen 518129, Peoples R China
[4] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA
关键词
Feature extraction; Three-dimensional displays; Magnetic heads; Joints; Convolution; Task analysis; Symbiosis; 3D skeleton-based action recognition; motion prediction; multiscale graph convolution networks; graph inference;
D O I
10.1109/TPAMI.2021.3053765
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; and 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multiscale graph convolution networks to extract spatial and temporal features. The multiscale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.
引用
收藏
页码:3316 / 3333
页数:18
相关论文
共 65 条
[1]  
[Anonymous], 2017, P INT C LEARN REPR T
[2]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[3]   The anatomy of a large-scale hypertextual Web search engine [J].
Brin, S ;
Page, L .
COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117
[4]   Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].
Cao, Zhe ;
Simon, Tomas ;
Wei, Shih-En ;
Sheikh, Yaser .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310
[5]   Fast Resampling of Three-Dimensional Point Clouds via Graphs [J].
Chen, Siheng ;
Tian, Dong ;
Feng, Chen ;
Vetro, Anthony ;
Kovacevic, Jelena .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (03) :666-681
[6]   Action-Agnostic Human Pose Forecasting [J].
Chiu, Hsu-kuang ;
Adeli, Ehsan ;
Wang, Borui ;
Huang, De-An ;
Niebles, Juan Carlos .
2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1423-1432
[7]  
Cho K, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P122
[8]  
Defferrard M, 2016, ADV NEUR IN, V29
[9]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[10]  
Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176