An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

被引:0
作者
Yuling Xing
Jia Zhu
Yu Li
Jin Huang
Jinlong Song
机构
[1] South China Normal University,Key Laboratory of Intelligent Education Technology and Application of Zhejiang Province
[2] Zhejiang Normal University,undefined
来源
Applied Intelligence | 2023年 / 53卷
关键词
Action recognition; Adaptive graph; Multi-scale; Occlusion and noise;
D O I
暂无
中图分类号
学科分类号
摘要
Skeleton-based action recognition methods using complete human skeletons have achieved remarkable performance, but the performance of these methods could significantly deteriorate when critical joints or frames of the skeleton sequence are occluded or disrupted. However, the acquisition of incomplete and noisy human skeletons is inevitable in realistic environments. In order to strengthen the robustness of action recognition model, we propose an Improved Spatial Temporal Graph Convolutional Network (IST-GCN) model, including three modules, namely Multi-dimension Adaptive Graph Convolutional Network (Md-AGCN), Enhanced Attention Mechanism (EAM) and Multi-Scale Temporal Convolutional Network (MS-TCN). Specifically, the Md-AGCN module can first adaptively adjust the graph structure according to different layers and the spatial dimension, temporal dimension, and channel dimension of different action samples to establish corresponding connections for long-range joints with dependencies. Then, the EAM module can focus on important information based on spatial domain, temporal domain and channel to further strengthen the dependencies between important joints. Finally, the MS-TCN module is used to enlarge the receptive field to extract more latent temporal dependencies. The comprehensive experiments on NTU-RGB+D and NTU-RGB+D 120 datasets demonstrate that our approach possesses outstanding performance in terms of both accuracy and robustness when skeleton samples are incomplete and noisy compared with the state-of-the-art (SOTA) approach. Moreover, the parameters and computational complexity of our model are far less than those of the existing approaches.
引用
收藏
页码:4592 / 4608
页数:16
相关论文
共 69 条
[1]  
Weinland D(2011)A survey of vision-based methods for action representation, segmentation and recognition Comput Vis Image Underst 115 224-241
[2]  
Ronfard R(2017)Approaches and applications of virtual reality and gesture recognition: A review Int J Ambient Comput Intell 8 1-18
[3]  
Boyer E(2019)Learning multi-temporal-scale deep information for action recognition Appl Intell 49 2017-2029
[4]  
Sudha MR(2021)Video sketch: A middle-level representation for action recognition Appl Intell 51 2589-2608
[5]  
Sriraghav K(2021)A combined multiple action recognition and summarization for surveillance video sequences Appl Intell 51 690-712
[6]  
Sudar Abisheck S(2021)Spatio-temporal attention on manifold space for 3d human action recognition Appl Intell 51 560-570
[7]  
Jacob SG(2021)Deep learning-based action recognition with 3D skeleton: a survey CAAI Trans Intell Technol 6 80-92
[8]  
Manisha S(2018)Skeleton-based human action recognition with global context-aware attention LSTM networks IEEE Trans Image Process 27 1586-1599
[9]  
Yao G(2020)A multi-stream graph convolutional networks-hidden conditional random field model for skeleton-based action recognition IEEE Transactions on Multimedia 23 64-76
[10]  
Lei T(2020)Skeleton-based action recognition with multi-stream adaptive graph convolutional networks IEEE Trans Image Process 29 9532-9545