Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction

被引：141

作者：

Li, Maosen ^{[1
,2
]}

Chen, Siheng ^{[1
,2
]}

Chen, Xu ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Tian, Qi ^{[3
,4
]}

机构：

[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Shanghai Key Lab Multimedia Proc & Transmiss, Shanghai, Peoples R China

[3] Huawei Cloud & Al, Shenzhen 518129, Peoples R China

[4] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 06期

关键词：

Feature extraction; Three-dimensional displays; Magnetic heads; Joints; Convolution; Task analysis; Symbiosis; 3D skeleton-based action recognition; motion prediction; multiscale graph convolution networks; graph inference;

D O I：

10.1109/TPAMI.2021.3053765

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; and 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multiscale graph convolution networks to extract spatial and temporal features. The multiscale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.

引用

页码：3316 / 3333

页数：18

共 65 条

[1]

[Anonymous], 2017, P INT C LEARN REPR T

[2] Large-Scale Machine Learning with Stochastic Gradient Descent [J].

Bottou, Leon .

COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186

[3] The anatomy of a large-scale hypertextual Web search engine [J].

Brin, S ;

Page, L .

COMPUTER NETWORKS AND ISDN SYSTEMS, 1998, 30 (1-7) :107-117

[4] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields [J].

Cao, Zhe ;

Simon, Tomas ;

Wei, Shih-En ;

Sheikh, Yaser .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1302-1310

[5] Fast Resampling of Three-Dimensional Point Clouds via Graphs [J].

Chen, Siheng ;

Tian, Dong ;

Feng, Chen ;

Vetro, Anthony ;

Kovacevic, Jelena .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (03) :666-681

[6] Action-Agnostic Human Pose Forecasting [J].

Chiu, Hsu-kuang ;

Adeli, Ehsan ;

Wang, Borui ;

Huang, De-An ;

Niebles, Juan Carlos .

2019 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2019, :1423-1432

[7]

Cho K, 2014, PROCEEDINGS OF THE 2014 9TH INTERNATIONAL CONFERENCE ON COMPUTER VISION, THEORY AND APPLICATIONS (VISAPP 2014), VOL 2, P122

[8]

Defferrard M, 2016, ADV NEUR IN, V29

[9]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[10]

Fernando B, 2015, PROC CVPR IEEE, P5378, DOI 10.1109/CVPR.2015.7299176

← 1 2 3 4 5 6 7 →