Spatial-temporal graph neural network based on node attention

被引：1

作者：

Li, Qiang ^{[1
]}

Wan, Jun ^{[2
]}

Zhang, Wucong ^{[2
]}

Kweh, Qian Long ^{[3
]}

机构：

[1] South China Univ Technol, Sch Elect & Informat Engn, Guangzhou, Peoples R China

[2] Midea Real Estate Holding Ltd, Midea Intelligent Life Res Inst, Foshan, Peoples R China

[3] Canadian Univ, Dubai, Saudi Arabia

来源：

APPLIED MATHEMATICS AND NONLINEAR SCIENCES | 2022年 / 7卷 / 02期

关键词：

Action recognition; skeletons; spatial-temporal graph convolution; attention mechanism;

D O I：

10.2478/amns.2022.1.00005

中图分类号：

O29 [应用数学];

学科分类号：

070104 ;

摘要：

Recently, the method of using graph neural network based on skeletons for action recognition has become more and more popular, due to the fact that a skeleton can carry very intuitive and rich action information, without being affected by background, light and other factors. The spatial-temporal graph convolutional neural network (ST-GCN) is a dynamic skeleton model that automatically learns spatial-temporal model from data, which not only has stronger expression ability, but also has stronger generalisation ability, showing remarkable results on public data sets. However, the ST-GCN network directly learns the information of adjacent nodes (local information), and is insufficient in learning the relations of non-adjacent nodes (global information), such as clapping action that requires learning the related information of non-adjacent nodes. Therefore, this paper proposes an ST-GCN based on node attention (NA-STGCN), so as to solve the problem of insufficient global information in ST-GCN by introducing node attention module to explicitly model the interdependence between global nodes. The experimental results on the NTU-RGB+D set show that the node attention module can effectively improve the accuracy and feature representation ability of the existing algorithms, and obviously improve the recognition effect of the actions that need global information.

引用

页码：703 / 712

页数：10

共 23 条

[11]

Oord Aaronvan den., Wavenet: A generative model for raw audio

[12]

Shahri Alimohammad, 2016, 2016 IEEE Tenth International Conference on Research Challenges in Information Science (RCIS), P1, DOI 10.1109/RCIS.2016.7549312

[13] Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition [J].

Shi, Lei ;

Zhang, Yifan ;

Cheng, Jian ;

Lu, Hanqing .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12018-12027

[14]

Simonyan K., 2014, NEURAL INFORM PROCES, P2136

[15]

Thakkar K., 2018, BRIT MACH VIS C BMVC

[16] Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group [J].

Vemulapalli, Raviteja ;

Arrate, Felipe ;

Chellappa, Rama .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :588-595

[17] Action Recognition with Improved Trajectories [J].

Wang, Heng ;

Schmid, Cordelia .

2013 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2013, :3551-3558

[18] Dense Trajectories and Motion Boundary Descriptors for Action Recognition [J].

Wang, Heng ;

Klaeser, Alexander ;

Schmid, Cordelia ;

Liu, Cheng-Lin .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 103 (01) :60-79

[19] Deep Parametric Continuous Convolutional Neural Networks [J].

Wang, Shenlong ;

Suo, Simon ;

Ma, Wei-Chiu ;

Pokrovsky, Andrei ;

Urtasun, Raquel .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2589-2597

[20] CBAM: Convolutional Block Attention Module [J].

Woo, Sanghyun ;

Park, Jongchan ;

Lee, Joon-Young ;

Kweon, In So .

COMPUTER VISION - ECCV 2018, PT VII, 2018, 11211 :3-19

← 1 2 3 →