Symbiotic Graph Neural Networks for 3D Skeleton-Based Human Action Recognition and Motion Prediction

被引：141

作者：

Li, Maosen ^{[1
,2
]}

Chen, Siheng ^{[1
,2
]}

Chen, Xu ^{[1
,2
]}

Zhang, Ya ^{[1
,2
]}

Wang, Yanfeng ^{[1
,2
]}

Tian, Qi ^{[3
,4
]}

机构：

[1] Shanghai Jiao Tong Univ, Cooperat Medianet Innovat Ctr, Shanghai, Peoples R China

[2] Shanghai Jiao Tong Univ, Shanghai Key Lab Multimedia Proc & Transmiss, Shanghai, Peoples R China

[3] Huawei Cloud & Al, Shenzhen 518129, Peoples R China

[4] Univ Texas San Antonio, Dept Comp Sci, San Antonio, TX 78249 USA

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2022年 / 44卷 / 06期

关键词：

Feature extraction; Three-dimensional displays; Magnetic heads; Joints; Convolution; Task analysis; Symbiosis; 3D skeleton-based action recognition; motion prediction; multiscale graph convolution networks; graph inference;

D O I：

10.1109/TPAMI.2021.3053765

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D skeleton-based action recognition and motion prediction are two essential problems of human activity understanding. In many previous works: 1) they studied two tasks separately, neglecting internal correlations; and 2) they did not capture sufficient relations inside the body. To address these issues, we propose a symbiotic model to handle two tasks jointly; and we propose two scales of graphs to explicitly capture relations among body-joints and body-parts. Together, we propose symbiotic graph neural networks, which contain a backbone, an action-recognition head, and a motion-prediction head. Two heads are trained jointly and enhance each other. For the backbone, we propose multi-branch multiscale graph convolution networks to extract spatial and temporal features. The multiscale graph convolution networks are based on joint-scale and part-scale graphs. The joint-scale graphs contain actional graphs, capturing action-based relations, and structural graphs, capturing physical constraints. The part-scale graphs integrate body-joints to form specific parts, representing high-level relations. Moreover, dual bone-based graphs and networks are proposed to learn complementary features. We conduct extensive experiments for skeleton-based action recognition and motion prediction with four datasets, NTU-RGB+D, Kinetics, Human3.6M, and CMU Mocap. Experiments show that our symbiotic graph neural networks achieve better performances on both tasks compared to the state-of-the-art methods.

引用

页码：3316 / 3333

页数：18

共 65 条

[51] LAND COVER CLASSIFICATION FOR SATELLITE IMAGES THROUGH 1D CNN [J].

Song, Yang ;

Zhang, Zhifei ;

Baghbaderani, Razieh Kaviani ;

Wang, Fanqi ;

Qu, Ying ;

Stutts, Craig ;

Qi, Hairong .

2019 10TH WORKSHOP ON HYPERSPECTRAL IMAGING AND SIGNAL PROCESSING - EVOLUTION IN REMOTE SENSING (WHISPERS), 2019,

[52] Deep Progressive Reinforcement Learning for Skeleton-based Action Recognition [J].

Tang, Yansong ;

Tian, Yi ;

Lu, Jiwen ;

Li, Peiyang ;

Zhou, Jie .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5323-5332

[53]

Thakkar K., 2018, BRIT MACH VIS C BMVC

[54] Rolling Rotations for Recognizing Human Actions from 3D Skeletal Data [J].

Vemulapalli, Raviteja ;

Chellappa, Rama .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4471-4479

[55] Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group [J].

Vemulapalli, Raviteja ;

Arrate, Felipe ;

Chellappa, Rama .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :588-595

[56] FeaStNet: Feature-Steered Graph Convolutions for 3D Shape Analysis [J].

Verma, Nitika ;

Boyer, Edmond ;

Verbeek, Jakob .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2598-2606

[57] The Pose Knows: Video Forecasting by Generating Pose Futures [J].

Walker, Jacob ;

Marino, Kenneth ;

Gupta, Abhinav ;

Hebert, Martial .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3352-3361

[58] Dividing and Aggregating Network for Multi-view Action Recognition [J].

Wang, Dongang ;

Ouyang, Wanli ;

Li, Wen ;

Xu, Dong .

COMPUTER VISION - ECCV 2018, PT IX, 2018, 11213 :457-473

[59] Mining Actionlet Ensemble for Action Recognition with Depth Cameras [J].

Wang, Jiang ;

Liu, Zicheng ;

Wu, Ying ;

Yuan, Junsong .

2012 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2012, :1290-1297

[60]

Wen YH, 2019, AAAI CONF ARTIF INTE, P8989

← 1 2 3 4 5 6 7 →