An improved spatial temporal graph convolutional network for robust skeleton-based action recognition

被引:17
作者
Xing, Yuling [1 ]
Zhu, Jia [2 ]
Li, Yu [1 ]
Huang, Jin [1 ]
Song, Jinlong [1 ]
机构
[1] South China Normal Univ, 55 Zhongshan Ave West, Guangzhou, Peoples R China
[2] Zhejiang Normal Univ, Key Lab Intelligent Educ Technol & Applicat Zheji, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Action recognition; Adaptive graph; Multi-scale; Occlusion and noise;
D O I
10.1007/s10489-022-03589-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Skeleton-based action recognition methods using complete human skeletons have achieved remarkable performance, but the performance of these methods could significantly deteriorate when critical joints or frames of the skeleton sequence are occluded or disrupted. However, the acquisition of incomplete and noisy human skeletons is inevitable in realistic environments. In order to strengthen the robustness of action recognition model, we propose an Improved Spatial Temporal Graph Convolutional Network (IST-GCN) model, including three modules, namely Multi-dimension Adaptive Graph Convolutional Network (Md-AGCN), Enhanced Attention Mechanism (EAM) and Multi-Scale Temporal Convolutional Network (MS-TCN). Specifically, the Md-AGCN module can first adaptively adjust the graph structure according to different layers and the spatial dimension, temporal dimension, and channel dimension of different action samples to establish corresponding connections for long-range joints with dependencies. Then, the EAM module can focus on important information based on spatial domain, temporal domain and channel to further strengthen the dependencies between important joints. Finally, the MS-TCN module is used to enlarge the receptive field to extract more latent temporal dependencies. The comprehensive experiments on NTU-RGB+D and NTU-RGB+D 120 datasets demonstrate that our approach possesses outstanding performance in terms of both accuracy and robustness when skeleton samples are incomplete and noisy compared with the state-of-the-art (SOTA) approach. Moreover, the parameters and computational complexity of our model are far less than those of the existing approaches.
引用
收藏
页码:4592 / 4608
页数:17
相关论文
共 50 条
[1]   SkeleMotion: A New Representation of Skeleton Joint Sequences Based on Motion Information for 3D Action Recognition [J].
Caetano, Carlos ;
Sena, Jessica ;
Bremond, Francois ;
dos Santos, Jefersson A. ;
Schwartz, William Robson .
2019 16TH IEEE INTERNATIONAL CONFERENCE ON ADVANCED VIDEO AND SIGNAL BASED SURVEILLANCE (AVSS), 2019,
[2]   Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints [J].
Caetano, Carlos ;
Bremond, Francois ;
Schwartz, William Robson .
2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, :16-23
[3]   An Autonomous Wireless Transmitter with Piezoelectric Energy Harvester from Short-Duration Vibrations [J].
Chang, Sheng-Kai ;
Chen, Kuan-Wei ;
Yang, Shang-De ;
Yang, Chin-Lung ;
Cheng, Kuang-Wei .
2018 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS (ISCAS), 2018,
[4]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[5]   Spatio-temporal attention on manifold space for 3D human action recognition [J].
Ding, Chongyang ;
Liu, Kai ;
Cheng, Fei ;
Belyaev, Evgeny .
APPLIED INTELLIGENCE, 2021, 51 (01) :560-570
[6]   A combined multiple action recognition and summarization for surveillance video sequences [J].
Elharrouss, Omar ;
Almaadeed, Noor ;
Al-Maadeed, Somaya ;
Bouridane, Ahmed ;
Beghdadi, Azeddine .
APPLIED INTELLIGENCE, 2021, 51 (02) :690-712
[7]   Optimized Skeleton-based Action Recognition via Sparsified Graph Regression [J].
Gao, Xiang ;
Hu, Wei ;
Tang, Jiaxiang ;
Liu, Jiaying ;
Guo, Zongming .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :601-610
[8]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/TPAMI.2019.2913372, 10.1109/CVPR.2018.00745]
[9]  
Huang G., 2017, P IEEE C COMP VIS PA, P4700, DOI [DOI 10.1109/CVPR.2017.243, 10.1109/CVPR.2017.243]
[10]  
Huang LJ, 2020, AAAI CONF ARTIF INTE, V34, P11045