Motion-Guided Graph Convolutional Network for Human Action Recognition

被引:0
作者
Li, Jingjing [1 ]
Huang, Zhangjin [1 ,2 ]
Zou, Lu [1 ]
机构
[1] School of Data Science, University of Science and Technology of China, Hefei
[2] School of Computer Science and Technology, University of Science and Technology of China, Hefei
来源
Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics | 2024年 / 36卷 / 07期
关键词
action recognition; graph convolution; human skeleton; motion-guided topology;
D O I
10.3724/SP.J.1089.2024.19898
中图分类号
学科分类号
摘要
The current skeleton-based human action recognition methods cannot model the changes in the dependence between joints over time, and the interaction of cross space-time information. To solve these problems, a novel motion-guided graph convolutional network (M-GCN) is proposed. Firstly, the high-level motion features are extracted from the skeleton sequence. Secondly, the predefined graphs and the learnable graphs are optimized by the motion-dependent correlations on the time dimension. And the different joint dependencies, i.e., the motion-guided topologies, are captured along the time dimension. Thirdly, the motion-guided topologies are used for spatial graph convolutions, and motion information is fused into spatial graph convolutions to realize the interaction of spatial-temporal information. Finally, spatial-temporal graph convolutions are applied alternately to implement precise human action recognition. Compared with the graph convolution method such as MS-G3D on the dataset NTU-RGB+D and the dataset NTU-RGB+D 120, the results show that the accuracy of the proposed method on the cross subject and cross view of NTU-RGB+D is improved to 92.3% and 96.7%, respectively, and the accuracy on the cross subject and cross setup of NTU-RGB+D 120 is improved to 88.8% and 90.2%, respectively. © 2024 Institute of Computing Technology. All rights reserved.
引用
收藏
页码:1077 / 1086
页数:9
相关论文
共 29 条
  • [11] Liu J, Shahroudy A, Perez M, Et al., NTU RGB+D 120: a large-scale benchmark for 3D human activity understanding, IEEE Transactions on Pattern Analysis and Machine Intelligence, 42, 10, pp. 2684-2701, (2020)
  • [12] Wang J, Liu Z C, Wu Y, Et al., Learning actionlet ensemble for 3D human action recognition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36, 5, pp. 914-927, (2014)
  • [13] Vemulapalli R, Arrate F, Chellappa R., Human action recognition by representing 3D skeletons as points in a lie group, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 588-595, (2014)
  • [14] Liu J, Wang G, Duan L Y, Et al., Skeleton-based human action recognition with global context-aware attention LSTM networks, IEEE Transactions on Image Processing, 27, 4, pp. 1586-1599, (2018)
  • [15] Song S J, Lan C L, Xing J L, Et al., An end-to-end spatio-temporal attention model for human action recognition from skeleton data, Proceedings of the 31st AAAI Conference on Artificial Intelligence, pp. 4263-4270, (2017)
  • [16] Ke Q H, Bennamoun M, An S J, Et al., A new representation of skeleton sequences for 3D action recognition, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4570-4579, (2017)
  • [17] Yang S Y, Liu J, Lu S J, Et al., Skeleton cloud colorization for unsupervised 3D action representation learning, Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 13403-13413, (2021)
  • [18] Cheng K, Zhang Y F, He X Y, Et al., Skeleton-based action recognition with shift graph convolutional network, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 180-189, (2020)
  • [19] Zhang P F, Lan C L, Zeng W J, Et al., Semantics-guided neural networks for efficient skeleton-based human action recognition, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1109-1118, (2020)
  • [20] Chen T L, Zhou D S, Wang J, Et al., Learning multi-granular spatio-temporal graph network for skeleton-based action recognition, Proceedings of the 29th ACM International Conference on Multimedia, pp. 4334-4342, (2021)