Motion-Guided Graph Convolutional Network for Human Action Recognition

被引:0
作者
Li, Jingjing [1 ]
Huang, Zhangjin [1 ,2 ]
Zou, Lu [1 ]
机构
[1] School of Data Science, University of Science and Technology of China, Hefei
[2] School of Computer Science and Technology, University of Science and Technology of China, Hefei
来源
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 07期
关键词
action recognition; graph convolution; human skeleton; motion-guided topology;
D O I
10.3724/SP.J.1089.2024.19898
中图分类号
学科分类号
摘要
The current skeleton-based human action recognition methods cannot model the changes in the dependence between joints over time, and the interaction of cross space-time information. To solve these problems, a novel motion-guided graph convolutional network (M-GCN) is proposed. Firstly, the high-level motion features are extracted from the skeleton sequence. Secondly, the predefined graphs and the learnable graphs are optimized by the motion-dependent correlations on the time dimension. And the different joint dependencies, i.e., the motion-guided topologies, are captured along the time dimension. Thirdly, the motion-guided topologies are used for spatial graph convolutions, and motion information is fused into spatial graph convolutions to realize the interaction of spatial-temporal information. Finally, spatial-temporal graph convolutions are applied alternately to implement precise human action recognition. Compared with the graph convolution method such as MS-G3D on the dataset NTU-RGB+D and the dataset NTU-RGB+D 120, the results show that the accuracy of the proposed method on the cross subject and cross view of NTU-RGB+D is improved to 92.3% and 96.7%, respectively, and the accuracy on the cross subject and cross setup of NTU-RGB+D 120 is improved to 88.8% and 90.2%, respectively. © 2024 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1077 / 1086
页数:9
相关论文
共 29 条
  • [1] Chen Y L, Wang Z C, Peng Y X, Et al., Cascaded pyramid network for multi-person pose estimation, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7103-7112, (2018)
  • [2] Ren B, Liu M Y, Ding R W, Et al., A survey on 3D skeleton-based action recognition using learning method
  • [3] Du Y, Wang W, Wang L., Hierarchical recurrent neural network for skeleton based action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1110-1118, (2015)
  • [4] Zhu W T, Lan C L, Xing J L, Et al., Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks, Proceedings of the 30th AAAI Conference on Artificial Intelligence, pp. 3697-3703, (2016)
  • [5] Wang H S, Wang L., Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3633-3642, (2017)
  • [6] Liu H, Tu J H, Liu M Y., Two-stream 3D convolutional neural network for skeleton-based action recognition
  • [7] Kim T S, Reiter A., Interpretable 3D human action analysis with temporal convolutional networks, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 1623-1631, (2017)
  • [8] Wang P C, Li Z Y, Hou Y H, Et al., Action recognition based on joint trajectory maps using convolutional neural networks, Proceedings of the 24th ACM International Conference on Multimedia, pp. 102-106, (2016)
  • [9] Kipf T N, Welling M., Semi-supervised classifica-tion with graph convolutional networks
  • [10] Shahroudy A, Liu J, Ng T T, Et al., NTU RGB+D: a large scale dataset for 3D human activity analysis, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1010-1019, (2016)