Spatial-temporal slowfast graph convolutional network for skeleton-based action recognition

被引:10
作者
Fang, Zheng [1 ]
Zhang, Xiongwei [1 ]
Cao, Tieyong [1 ,2 ]
Zheng, Yunfei [1 ]
Sun, Meng [1 ]
机构
[1] Peoples Liberat Army Engn Univ, Inst Command & Control Engn, Nanjing 210001, Jiangsu, Peoples R China
[2] Army Artillery & Def Acad PLA Nanjing, Nanjing, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
computer vision; graph theory; video signal processing; video signals;
D O I
10.1049/cvi2.12080
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In skeleton-based action recognition, the graph convolutional network (GCN) has achieved great success. Modelling skeleton data in a suitable spatial-temporal way and designing the adjacency matrix are crucial aspects for GCN-based methods to capture joint relationships. In this study, we propose the spatial-temporal slowfast graph convolutional network (STSF-GCN) and design the adjacency matrices for the skeleton data graphs in STSF-GCN. STSF-GCN contains two pathways: (1) the fast pathway is in a high frame rate, and joints of adjacent frames are unified to build 'small' spatial-temporal graphs. A new spatial-temporal adjacency matrix is proposed for these 'small' spatial-temporal graphs. Ablation studies verify the effectiveness of the proposed adjacency matrix. (2) The slow pathway is in a low frame rate, and joints from all frames are unified to build one 'big' spatial-temporal graph. The adjacency matrix for the 'big' spatial-temporal graph is obtained by computing self-attention coefficients of each joint. Finally, outputs from two pathways are fused to predict the action category. STSF-GCN can efficiently capture both long-range and short-range spatial-temporal joint relationships. On three datasets for skeleton-based action recognition, STSF-GCN can achieve state-of-the-art performance with much less computational cost.
引用
收藏
页码:205 / 217
页数:13
相关论文
共 41 条
[1]   Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].
Cao, Congqi ;
Lan, Cuiling ;
Zhang, Yifan ;
Zeng, Wenjun ;
Lu, Hanqing ;
Zhang, Yanning .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257
[2]   Skeleton-Based Action Recognition with Shift Graph Convolutional Network [J].
Cheng, Ke ;
Zhang, Yifan ;
He, Xiangyu ;
Chen, Weihan ;
Cheng, Jian ;
Lu, Hanqing .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :180-189
[3]   Adaptive Graph Encoder for Attributed Graph Embedding [J].
Cui, Ganqu ;
Zhou, Jie ;
Yang, Cheng ;
Liu, Zhiyuan .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :976-985
[4]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[5]   SlowFast Networks for Video Recognition [J].
Feichtenhofer, Christoph ;
Fan, Haoqi ;
Malik, Jitendra ;
He, Kaiming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6201-6210
[6]   Optimized Skeleton-based Action Recognition via Sparsified Graph Regression [J].
Gao, Xiang ;
Hu, Wei ;
Tang, Jiaxiang ;
Liu, Jiaying ;
Guo, Zongming .
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, :601-610
[7]  
Huang LJ, 2020, AAAI CONF ARTIF INTE, V34, P11045
[8]   Learning Clip Representations for Skeleton-Based 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2018, 27 (06) :2842-2855
[9]   A New Representation of Skeleton Sequences for 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579
[10]   Ensemble Deep Learning for Skeleton-based Action Recognition using Temporal Sliding LSTM networks [J].
Lee, Inwoong ;
Kim, Doyoung ;
Kang, Seoungyoon ;
Lee, Sanghoon .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1012-1020