Motion-Guided Graph Convolutional Network for Human Action Recognition

被引：0

作者：

Li, Jingjing ^{[1
]}

Huang, Zhangjin ^{[1
,2
]}

Zou, Lu ^{[1
]}

机构：

[1] School of Data Science, University of Science and Technology of China, Hefei

[2] School of Computer Science and Technology, University of Science and Technology of China, Hefei

来源：

Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 07期

关键词：

action recognition; graph convolution; human skeleton; motion-guided topology;

D O I：

10.3724/SP.J.1089.2024.19898

中图分类号：

学科分类号：

摘要：

The current skeleton-based human action recognition methods cannot model the changes in the dependence between joints over time, and the interaction of cross space-time information. To solve these problems, a novel motion-guided graph convolutional network (M-GCN) is proposed. Firstly, the high-level motion features are extracted from the skeleton sequence. Secondly, the predefined graphs and the learnable graphs are optimized by the motion-dependent correlations on the time dimension. And the different joint dependencies, i.e., the motion-guided topologies, are captured along the time dimension. Thirdly, the motion-guided topologies are used for spatial graph convolutions, and motion information is fused into spatial graph convolutions to realize the interaction of spatial-temporal information. Finally, spatial-temporal graph convolutions are applied alternately to implement precise human action recognition. Compared with the graph convolution method such as MS-G3D on the dataset NTU-RGB+D and the dataset NTU-RGB+D 120, the results show that the accuracy of the proposed method on the cross subject and cross view of NTU-RGB+D is improved to 92.3% and 96.7%, respectively, and the accuracy on the cross subject and cross setup of NTU-RGB+D 120 is improved to 88.8% and 90.2%, respectively. © 2024 Institute of Computing Technology. All rights reserved.

引用

页码：1077 / 1086

页数：9

共 29 条

[1]

Chen Y L, Wang Z C, Peng Y X, Et al., Cascaded pyramid network for multi-person pose estimation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7103-7112, (2018)

[2]

Ren B, Liu M Y, Ding R W, Et al., A survey on 3D skeleton-based action recognition using learning method

[3]

Du Y, Wang W, Wang L., Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110-1118, (2015)

[4]

Zhu W T, Lan C L, Xing J L, Et al., Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks, Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 3697-3703, (2016)

[5]

Wang H S, Wang L., Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3633-3642, (2017)

[6]

Liu H, Tu J H, Liu M Y., Two-stream 3D convolutional neural network for skeleton-based action recognition

[7]

Kim T S, Reiter A., Interpretable 3D human action analysis with temporal convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1623-1631, (2017)

[8]

Wang P C, Li Z Y, Hou Y H, Et al., Action recognition based on joint trajectory maps using convolutional neural networks, Proceedings of the 24th ACM International Conference on Multimedia, pp. 102-106, (2016)

[9]

Kipf T N, Welling M., Semi-supervised classifica-tion with graph convolutional networks

[10]

Shahroudy A, Liu J, Ng T T, Et al., NTU RGB+D: a large scale dataset for 3D human activity analysis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010-1019, (2016)

← 1 2 3 →