Temporal segment graph convolutional networks for skeleton-based action recognition

被引:32
作者
Ding, Chongyang [1 ]
Wen, Shan [1 ]
Ding, Wenwen [2 ]
Liu, Kai [1 ]
Belyaev, Evgeny [3 ]
机构
[1] Xidian Univ, Sch Comp Sci & Technol, Xian, Peoples R China
[2] Huaibei Normal Univ, Sch Math Sci, Huaibei, Peoples R China
[3] ITMO Univ, Int Lab Comp Technol, St Petersburg, Russia
关键词
Skeleton sequence; Temporal misalignment; Action recognition; Temporal segment; Graph construction;
D O I
10.1016/j.engappai.2022.104675
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Different actions usually emphasize on different parts of a skeleton, even for a specific action, different action stages have the corresponding emphases. Previous studies generally construct the human skeletons as predefined, thus lacking the adaptability to different action modes. In addition, these methods simply employ the padding or truncation operation on the skeleton sequence to fix the sequence length, resulting in additional temporal misalignment problem. In this work, we propose a novel temporal segment graph convolutional networks (TS-GCN) for skeleton-based action recognition. Our model divides the whole sequence into several subsequences. Then GCNs are applied on each subsequence to capture the dynamic information stage by stage, which can align the motion features in temporal domain. Besides, in order to explore the intrinsic features contained in each subsequence, our model introduces a graph-adaptive method to construct an individual graph that can be learned and updated from skeleton data for each subsequence, which increases the generality of graph construction to adapt to different sequences. Extensive experiments are conducted on two standard datasets, NTU-RGB+D and Kinetics. The experimental results demonstrate the effectiveness of the proposed method.
引用
收藏
页数:8
相关论文
共 45 条
[11]   A wavelet tensor fuzzy clustering scheme for multi-sensor human activity recognition [J].
He, Hong ;
Tan, Yonghong ;
Zhang, Wuxiong .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2018, 70 :109-122
[12]  
Henaff M., 2015, Deep convolutional networks on graphstructured data
[13]  
Hussein M.E., 2013, INT JOINT C ART INT
[14]   VISUAL-PERCEPTION OF BIOLOGICAL MOTION AND A MODEL FOR ITS ANALYSIS [J].
JOHANSSON, G .
PERCEPTION & PSYCHOPHYSICS, 1973, 14 (02) :201-211
[15]  
Kay Will, 2017, The Kinetics Human Action Video Dataset
[16]   A New Representation of Skeleton Sequences for 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579
[17]   Interpretable 3D Human Action Analysis with Temporal Convolutional Networks [J].
Kim, Tae Soo ;
Reiter, Austin .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1623-1631
[18]  
Kipf T, 2016, ARXIV
[19]  
Li C., 2018, ARXIV PREPRINT ARXIV
[20]   Actional-Structural Graph Convolutional Networks for Skeleton-based Action Recognition [J].
Li, Maosen ;
Chen, Siheng ;
Chen, Xu ;
Zhang, Ya ;
Wang, Yanfeng ;
Tian, Qi .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3590-3598