Spatial Graph Convolutional and Temporal Involution Network for Skeleton-based Action Recognition

被引：0

作者：

Wan, Huifan ^{[1
]}

Pan, Guanghui ^{[1
]}

Chen, Yu ^{[1
]}

Ding, Danni ^{[1
]}

Zou, Maoyang ^{[1
]}

机构：

[1] Chengdu Univ Informat Technol, Sch Comp Sci, Chengdu, Peoples R China

来源：

PROCEEDINGS OF ACM TURING AWARD CELEBRATION CONFERENCE, ACM TURC 2021 | 2021年

关键词：

Action Recognition; GCNs; Involution; Skeleton;

D O I：

10.1145/3472634.3474073

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The existing skeleton-based action recognition methods, based on graph convolutional networks (GCN), cannot well capture continuous behavior information in the time dimension. As traditional Convolution neural network is difficult to extract timing information and capture long-distance relationship. Relying on stacking a large number of convolution kernels to increase the receptive field and feature diversity, not only increases the amounts of parameters and computational complexity, but also causes a large amount of redundancy in the channel dimensions of the convolution kernel. Therefore, we propose the Spatial Graph Convolutional and Temporal involution network (ST-TI). Firstly, in the spatial dimension, GCN is used to obtain the spatial correlation of a single frame of human skeleton points. Then, in the temporal dimension. Involution operation is used to extract the correlations of skeleton points in different frames. The entire model is composed of 9 layers of SG-TI units. Each SG-TI unit uses residual connection and then uses a fully connected layer to ensure that the output and prediction categories have the same dimensionality. Finally, we feed the output feature to a SoftMax classifier. The effect of the model is verified on two public behavior recognition data sets of kinetics and NTU RGB+D. Experiments show that compared with the benchmark network ST-GCN, the algorithm has reduced the number of parameters by 3.7 times and improved the recognition accuracy by 3%

引用

页码：204 / 209

页数：6

共 22 条

[1] Skeleton Image Representation for 3D Action Recognition based on Tree Structure and Reference Joints [J].

Caetano, Carlos ;

Bremond, Francois ;

Schwartz, William Robson .

2019 32ND SIBGRAPI CONFERENCE ON GRAPHICS, PATTERNS AND IMAGES (SIBGRAPI), 2019, :16-23

[2]

Caetano Carlos, 2019, arXiv

[3] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

[4]

Jaderberg Max, 2014, BMVC 2014 P BR MACH, DOI [10.5244/c.28.88Conf.2014, DOI 10.5244/C.28.88CONF.2014]

[5]

Kay W., 2017, arXiv preprint arXiv:1705.06950

[6] Interpretable 3D Human Action Analysis with Temporal Convolutional Networks [J].

Kim, Tae Soo ;

Reiter, Austin .

2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1623-1631

[7]

Li D, 2021, Arxiv, DOI [arXiv:2103.06255, 10.48550/ARXIV.2103.06255]

[8] Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN [J].

Li, Shuai ;

Li, Wanqing ;

Cook, Chris ;

Zhu, Ce ;

Gao, Yanbo .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5457-5466

[9] Skeleton-Based Human Action Recognition With Global Context-Aware Attention LSTM Networks [J].

Liu, Jun ;

Wang, Gang ;

Duan, Ling-Yu ;

Abdiyeva, Kamila ;

Kot, Alex C. .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (04) :1586-1599

[10] Spatio-Temporal LSTM with Trust Gates for 3D Human Action Recognition [J].

Liu, Jun ;

Shahroudy, Amir ;

Xu, Dong ;

Wang, Gang .

COMPUTER VISION - ECCV 2016, PT III, 2016, 9907 :816-833

← 1 2 3 →