Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition

被引：10

作者：

Fang, Zheng ^{[1
]}

Zhang, Xiongwei ^{[1
]}

Cao, Tieyong ^{[1
,2
]}

Zheng, Yunfei ^{[1
]}

Sun, Meng ^{[1
]}

机构：

[1] Peoples Liberat Army Engn Univ, Inst Command & Control Engn, Nanjing 210001, Jiangsu, Peoples R China

[2] Army Artillery & Def Acad PLA Nanjing, Nanjing, Jiangsu, Peoples R China

来源：

IET COMPUTER VISION | 2022年 / 16卷 / 03期

基金：

中国国家自然科学基金;

关键词：

computer vision; graph theory; video signal processing; video signals;

D O I：

10.1049/cvi2.12080

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In skeleton-based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial-temporal way and designing the adjacency matrix are crucial aspects for GCN-based methods to capture joint relationships. In this study, we propose the spatial-temporal slowfast graph convolutional network (STSF-GCN) and design the adjacency matrices for the skeleton data graphs in STSF-GCN. STSF-GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build 'small' spatial-temporal graphs. A new spatial-temporal adjacency matrix is proposed for these 'small' spatial-temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one 'big' spatial-temporal graph. The adjacency matrix for the 'big' spatial-temporal graph is obtained by computing self-attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF-GCN can efficiently capture both long-range and short-range spatial-temporal joint relationships. On three datasets for skeleton-based action recognition, STSF-GCN can achieve state-of-the-art performance with much less computational cost.

引用

页码：205 / 217

页数：13

共 41 条

[1] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].

Cao, Congqi ;

Lan, Cuiling ;

Zhang, Yifan ;

Zeng, Wenjun ;

Lu, Hanqing ;

Zhang, Yanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257

[2] Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].

Cheng, Ke ;

Zhang, Yifan ;

He, Xiangyu ;

Chen, Weihan ;

Cheng, Jian ;

Lu, Hanqing .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189

[3] Adaptive Graph Encoder for Attributed Graph Embedding [J].

Cui, Ganqu ;

Zhou, Jie ;

Yang, Cheng ;

Liu, Zhiyuan .

KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :976-985

[4]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[5] SlowFast Networks for Video Recognition [J].

Feichtenhofer, Christoph ;

Fan, Haoqi ;

Malik, Jitendra ;

He, Kaiming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210

[6] Optimized Skeleton-based Action Recognition via Sparsified Graph Regression [J].

Gao, Xiang ;

Hu, Wei ;

Tang, Jiaxiang ;

Liu, Jiaying ;

Guo, Zongming .

PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :601-610

[7]

Huang LJ, 2020, AAAI CONF ARTIF INTE, V34, P11045

[8] Learning Clip Representations for Skeleton-Based 3D Action Recognition [J].

Ke, Qiuhong ;

Bennamoun, Mohammed ;

An, Senjian ;

Sohel, Ferdous ;

Boussaid, Farid .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (06) :2842-2855

[9] A New Representation of Skeleton Sequences for 3D Action Recognition [J].

Ke, Qiuhong ;

Bennamoun, Mohammed ;

An, Senjian ;

Sohel, Ferdous ;

Boussaid, Farid .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579

[10] Ensemble Deep Learning for Skeleton-based Action Recognition using Temporal Sliding LSTM networks [J].

Lee, Inwoong ;

Kim, Doyoung ;

Kang, Seoungyoon ;

Lee, Sanghoon .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1012-1020

← 1 2 3 4 5 →