PGCN-TCA: Pseudo Graph Convolutional Network With Temporal and Channel-Wise Attention for Skeleton-Based Action Recognition

被引:44
作者
Yang, Hongye [1 ,2 ]
Gu, Yuzhang [1 ,2 ]
Zhu, Jianchao [3 ]
Hu, Keli [4 ]
Zhang, Xiaolin [1 ,2 ,5 ]
机构
[1] Chinese Acad Sci, Shanghai Inst Microsyst & Informat Technol, Biovis Syst Lab, State Key Lab Transducer Technol, Shanghai 200050, Peoples R China
[2] Univ Chinese Acad Sci, Sch Elect Elect & Commun Engn, Beijing 100049, Peoples R China
[3] East China Normal Univ, Sch Comp Sci & Software Engn, Shanghai 200062, Peoples R China
[4] Shaoxing Univ, Dept Comp Sci & Engn, Shaoxing 312000, Peoples R China
[5] ShanghaiTech Univ, Sch Informat Sci & Technol, Shanghai 201210, Peoples R China
基金
中国国家自然科学基金;
关键词
Computer vision; skeleton-based action recognition; temporal and channel-wise attention;
D O I
10.1109/ACCESS.2020.2964115
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Skeleton-based human action recognition has become an active research area in recent years. The key to this task is to fully explore both spatial and temporal features. Recently, GCN-based methods modeling the human body skeletons as spatial-temporal graphs, have achieved remarkable performances. However, most GCN-based methods use a fixed adjacency matrix defined by the dataset, which can only capture the structural information provided by joints directly connected through bones and ignore the dependencies between distant joints that are not connected. In addition, such a fixed adjacency matrix used in all layers leads to the network failing to extract multi-level semantic features. In this paper we propose a pseudo graph convolutional network with temporal and channel-wise attention (PGCN-TCA) to solve this problem. The fixed normalized adjacent matrix is substituted with a learnable matrix. In this way, the matrix can learn the dependencies between connected joints and joints that are not physically connected. At the same time, learnable matrices in different layers can help the network capture multi-level features in spatial domain. Moreover, Since frames and input channels that contain outstanding characteristics play significant roles in distinguishing the action from others, we propose a mixed temporal and channel-wise attention. Our method achieves comparable performances to state-of-the-art methods on NTU-RGB & x002B;D and HDM05 datasets.
引用
收藏
页码:10040 / 10047
页数:8
相关论文
共 37 条
[1]  
[Anonymous], 2017, P ANN REL MAINT S RA, DOI [DOI 10.1609/AAAI.V31I1.10866, DOI 10.1109/RAM.2017.7889722]
[2]  
Du Y, 2015, PROCEEDINGS 3RD IAPR ASIAN CONFERENCE ON PATTERN RECOGNITION ACPR 2015, P579, DOI 10.1109/ACPR.2015.7486569
[3]  
Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[4]   Discriminative human action recognition in the learned hierarchical manifold space [J].
Han, Lei ;
Wu, Xinxiao ;
Liang, Wei ;
Hou, Guangming ;
Jia, Yunde .
IMAGE AND VISION COMPUTING, 2010, 28 (05) :836-849
[5]  
Hu J, 2018, PROC CVPR IEEE, P7132, DOI [10.1109/CVPR.2018.00745, 10.1109/TPAMI.2019.2913372]
[6]   A New Representation of Skeleton Sequences for 3D Action Recognition [J].
Ke, Qiuhong ;
Bennamoun, Mohammed ;
An, Senjian ;
Sohel, Ferdous ;
Boussaid, Farid .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579
[7]   Interpretable 3D Human Action Analysis with Temporal Convolutional Networks [J].
Kim, Tae Soo ;
Reiter, Austin .
2017 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2017, :1623-1631
[8]  
Kipf T. N., 2016, INT C LEARN REPR
[9]   Tower Crane remote wireless monitoring system based on Modbus/TCP protocol [J].
Li, Bo ;
Chen, Geng ;
Wang, Le ;
Hao, Zhe .
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND ENGINEERING (CSE) AND IEEE/IFIP INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (EUC), VOL 2, 2017, :187-190
[10]  
Li C., 2018, arXiv.1804.06055