Scale Adaptive Graph Convolutional Network for Skeleton-Based Action Recognition

被引:2
作者
Wang X. [1 ]
Zhong Y. [1 ]
Jin L. [1 ]
Xiao Y. [1 ]
机构
[1] School of Electronic Engineering, Beijing University of Posts and Telecommunications, Beijing
来源
Tianjin Daxue Xuebao (Ziran Kexue yu Gongcheng Jishu Ban)/Journal of Tianjin University Science and Technology | 2022年 / 55卷 / 03期
基金
中国国家自然科学基金;
关键词
Action recognition; Graph convolutional network(GCN); Human skeleton; Scale adaptive;
D O I
10.11784/tdxbz202012073
中图分类号
学科分类号
摘要
In skeleton-based action recognition, graph convolutional network(GCN), which models the human skeleton sequences as spatiotemporal graphs, have achieved excellent performance. However, in existing GCN-based methods, the topology of the graph is set manually, and it is fixed over all layers and input samples. This approach may not be optimal for diverse samples. Constructing an scale adaptive graph based on sample characteristics can better capture spatiotemporal features. Moreover, most methods do not explicitly exploit the multiple scales of body components, which carry crucial information for action recognition. In this paper, we proposed a scale adaptive graph convolutional network comprising a dynamic scale graph convolution module and a multiscale fusion module. Specifically, we first used an a priori and attention mechanism to construct an activity judger, which can divide each keypoint into two parts based on whether it is active;thereafter, a scale adaptive graph was automatically learned. This module accelerated the feature transfer between nodes while minimizing the feature loss. Furthermore, we proposed a multiscale fusion module based on the channel attention mechanism to extract features at different scales and fuse features across scales. Moreover, we used a four-stream framework to model the first-order, second-order, and motion information of a skeleton, which shows notable improvement in terms of recognition accuracy. Extensive experiments on the NTU-RGBD dataset demonstrate the effectiveness of our method. Results show that the algorithm achieves 89.7% and 96.1% classification accuracy on the cross-subject(CS) and cross-view(CV) subsets of the NTU-RGBD dataset, respectively, thus significantly improving the accuracy of action recognition. © 2022, Editorial Board of Journal of Tianjin University(Science and Technology). All right reserved.
引用
收藏
页码:306 / 312
页数:6
相关论文
共 20 条
[1]  
Herath S, Harandi M, Porikli F., Going deeper into action recognition: A survey, Image and Vision Computing, 60, pp. 4-21, (2017)
[2]  
Simonyan K, Zisserman A., Two-stream convolutional networks for action recognition in videos, Neural Information Processing Systems, pp. 568-576, (2014)
[3]  
Hussein M E, Torki M, Gowayyed M A, Et al., Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations, International Joint Conference on Artificial Intelligence, pp. 2466-2479, (2013)
[4]  
Wang J, Liu Z C, Wu Y., Mining actionlet ensemble for action recognition with depth cameras, IEEE Computer Vision and Pattern Recognition, pp. 1290-1297, (2012)
[5]  
Vemulapalli R, Arrate F, Chellappa R., Human action recognition by representing 3D skeletons as points in a lie group, IEEE Conference on Computer Vision and Pattern Recognition, pp. 588-595, (2014)
[6]  
Qi M S, Wang Y H, Qin J, Et al., StagNet: An attentive semantic RNN for group activity and individual action recognition, IEEE Transactions on Circuits and Systems for Video Technology, 30, 2, pp. 549-565, (2020)
[7]  
Lin J, Gan C, Han S., TSM: Temporal shift module for efficient video understanding, IEEE International Conference on Computer Vision, pp. 7082-7092, (2019)
[8]  
Zhao M C, Xiu S W, Peng W, Et al., Multi-label image recognition with graph convolutional networks, IEEE Conference on Computer Vision and Pattern Recognition, pp. 5172-5181, (2019)
[9]  
Yan S J, Xiong Y J, Lin D H., Spatial temporal graph convolutional networks for skeleton-based action recognition, AAAI Conference on Artificial Intelligence, pp. 7444-7452, (2018)
[10]  
Liu Z Y, Zhang H W, Chen Z H, Et al., Disentangling and unifying graph convolutions for skeleton-based action recognition, IEEE Conference on Computer Vision and Pattern Recognition, pp. 140-149, (2020)