Spatial-temporal graph neural network based on node attention

被引:1
作者
Li, Qiang [1 ]
Wan, Jun [2 ]
Zhang, Wucong [2 ]
Kweh, Qian Long [3 ]
机构
[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Peoples R China
[2] Midea Real Estate Holding Ltd, Midea Intelligent Life Res Inst, Foshan, Peoples R China
[3] Canadian Univ, Dubai, Saudi Arabia
关键词
Action recognition; skeletons; spatial-temporal graph convolution; attention mechanism;
D O I
10.2478/amns.2022.1.00005
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Recently, the method of using graph neural network based on skeletons for action recognition has become more and more popular, due to the fact that a skeleton can carry very intuitive and rich action information, without being affected by background, light and other factors. The spatial-temporal graph convolutional neural network (ST-GCN) is a dynamic skeleton model that automatically learns spatial-temporal model from data, which not only has stronger expression ability, but also has stronger generalisation ability, showing remarkable results on public data sets. However, the ST-GCN network directly learns the information of adjacent nodes (local information), and is insufficient in learning the relations of non-adjacent nodes (global information), such as clapping action that requires learning the related information of non-adjacent nodes. Therefore, this paper proposes an ST-GCN based on node attention (NA-STGCN), so as to solve the problem of insufficient global information in ST-GCN by introducing node attention module to explicitly model the interdependence between global nodes. The experimental results on the NTU-RGB+D set show that the node attention module can effectively improve the accuracy and feature representation ability of the existing algorithms, and obviously improve the recognition effect of the actions that need global information.
引用
收藏
页码:703 / 712
页数:10
相关论文
共 23 条
[1]  
Chen YL, 2016, DESTECH TRANS COMP
[2]  
Christoph R., 2016, NIPS 16, P3468
[3]   Learning Spatiotemporal Features with 3D Convolutional Networks [J].
Du Tran ;
Bourdev, Lubomir ;
Fergus, Rob ;
Torresani, Lorenzo ;
Paluri, Manohar .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497
[4]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[5]  
HE KM, 2016, PROC CVPR IEEE, P770, DOI DOI 10.1109/CVPR.2016.90
[6]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[7]   Interpretable 3D Human Action Analysis with Temporal Convolutional Networks [J].
Kim, Tae Soo ;
Reiter, Austin .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1623-1631
[8]  
Kong Y., 2019, J. Electron. Imag., V28, P1
[9]  
Li C, 2018, IEEE IJCNN
[10]  
Liu Hong., 2017, Two-Stream 3D Convolutional Neural Network for Skeleton-Based Action Recognition"