Adaptive multi-view graph convolutional networks for skeleton-based action recognition

被引：21

作者：

Liu, Xing ^{[1
]}

Li, Yanshan ^{[1
]}

Xia, Rongjie ^{[1
]}

机构：

[1] Shenzhen Univ, ATR Natl Key Lab Def Technol, Shenzhen, Peoples R China

来源：

NEUROCOMPUTING | 2021年 / 444卷

基金：

中国国家自然科学基金;

关键词：

Human action recognition; Graph convolution; View adaptation; Multiple viewpoints; REPRESENTATION;

D O I：

10.1016/j.neucom.2020.03.126

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Skeleton based human action recognition has attracted more and more attentions recently thanks to the accessibility of depth sensors and the development of pose estimation techniques. Conventional approaches such as convolutional neural networks usually model skeletons with grid-shaped representations, which cannot explicitly explore the dependency between two correlated joints. In this paper, we treat the skeleton as a single graph with joints as nodes and bones as edges. Based on the skeleton graph, the improved graph convolutional network called adaptive multi-view graph convolutional networks(AMV-GCNs) is proposed to deal with skeleton based action recognition. We firstly construct a novel skeleton graph and two kinds of graph nodes are defined to model the spatial configuration and temporal dynamics respectively. Then the generated graphs along with feature vectors on graph nodes are fed into AMV-GCNs. In AMV-GCNs, an adaptive view transformation module is designed to reduce the impact of view diversity. Proposed module can automatically determine suitable viewpoints and transform skeletons to new representations under those viewpoints for better recognition. Further, we employ multiple GCNs based streams to utilize and learn action information from different viewpoints. Finally, the classification scores from multiple streams are fused to provide the recognition result. Extensive experimental evaluations on four challenging datasets, NTU RGB+D 60, NTU RGB+D 120, Northwestern-UCLA and UTD-MHAD, demonstrate the superiority of our proposed network. (c) 2020 Published by Elsevier B.V.

引用

页码：288 / 300

页数：13

共 56 条

[1] Effective Active Skeleton Representation for Low Latency Human Action Recognition [J].

Cai, Xingyang ;

Zhou, Wengang ;

Wu, Lei ;

Luo, Jiebo ;

Li, Houqiang .

IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) :141-154

[2] Skeleton-Based Action Recognition With Gated Convolutional Neural Networks [J].

Cao, Congqi ;

Lan, Cuiling ;

Zhang, Yifan ;

Zeng, Wenjun ;

Lu, Hanqing ;

Zhang, Yanning .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2019, 29 (11) :3247-3257

[3]

Chen C, 2015, IEEE IMAGE PROC, P168, DOI 10.1109/ICIP.2015.7350781

[4]

Defferrard M, 2016, ADV NEUR IN, V29

[5] Pose Encoding for Robust Skeleton-Based Action Recognition [J].

Demisse, Girum G. ;

Papadopoulos, Konstantinos ;

Aouada, Djamila ;

Ottersten, Bjorn .

PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, :301-307

[6]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[7] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition [J].

Fan, Zhaoxuan ;

Zhao, Xu ;

Lin, Tianwei ;

Su, Haisheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) :363-374

[8]

Gao X., 2018, ARXIV181112013

[9] Skeleton Optical Spectra-Based Action Recognition Using Convolutional Neural Networks [J].

Hou, Yonghong ;

Li, Zhaoyang ;

Wang, Pichao ;

Li, Wanqing .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2018, 28 (03) :807-811

[10]

Hu JF, 2015, PROC CVPR IEEE, P5344, DOI 10.1109/CVPR.2015.7299172

← 1 2 3 4 5 6 →