Skeleton-Based Action Recognition Using Spatio-Temporal LSTM Network with Trust Gates

被引：366

作者：

Liu, Jun ^{[1
]}

Shahroudy, Amir ^{[1
]}

Xu, Dong ^{[3
]}

Kot, Alex C. ^{[1
]}

Wang, Gang ^{[2
]}

机构：

[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore

[2] Alibaba Grp, Hangzhou 310052, Zhejiang, Peoples R China

[3] Univ Sydney, Sch Elect & Informat Engn, Sydney, NSW 2006, Australia

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2018年 / 40卷 / 12期

基金：

新加坡国家研究基金会;

关键词：

Action recognition; recurrent neural networks; long short-term memory; spatio-temporal analysis; tree traversal; trust gate; skeleton sequence; HISTOGRAMS; JOINTS; POSE;

D O I：

10.1109/TPAMI.2017.2771306

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Skeleton-based human action recognition has attracted a lot of research attention during the past few years. Recent works attempted to utilize recurrent neural networks to model the temporal dependencies between the 3D positional configurations of human body joints for better analysis of human activities in the skeletal data. The proposed work extends this idea to spatial domain as well as temporal domain to better analyze the hidden sources of action-related information within the human skeleton sequences in both of these domains simultaneously. Based on the pictorial structure of Kinect's skeletal data, an effective tree-structure based traversal framework is also proposed. In order to deal with the noise in the skeletal data, a new gating mechanism within LSTM module is introduced, with which the network can learn the reliability of the sequential data and accordingly adjust the effect of the input data on the updating procedure of the long-term context representation stored in the unit's memory cell. Moreover, we introduce a novel multi-modal feature fusion strategy within the LSTM unit in this paper. The comprehensive experimental results on seven challenging benchmark datasets for human action recognition demonstrate the effectiveness of the proposed method.

引用

页码：3007 / 3021

页数：15

共 101 条

[21] Effective Active Skeleton Representation for Low Latency Human Action Recognition [J].

Cai, Xingyang ;

Zhou, Wengang ;

Wu, Lei ;

Luo, Jiebo ;

Li, Houqiang .

IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) :141-154

[22]

Chen C, 2016, INT CONF ACOUST SPEE, P2712, DOI 10.1109/ICASSP.2016.7472170

[23]

Chrungoo A, 2014, LECT NOTES ARTIF INT, V8755, P84, DOI 10.1007/978-3-319-11973-1_9

[24]

Collobert R, 2011, BIGLEARN NIPS WORKSH, P1

[25] Human detection using oriented histograms of flow and appearance [J].

Dalal, Navneet ;

Triggs, Bill ;

Schmid, Cordelia .

COMPUTER VISION - ECCV 2006, PT 2, PROCEEDINGS, 2006, 3952 :428-441

[26] Structure Inference Machines: Recurrent Neural Networks for Analyzing Relations in Group Activity Recognition [J].

Deng, Zhiwei ;

Vandat, Arash ;

Hu, Hexiang ;

Mori, Greg .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4772-4781

[27] 3-D Human Action Recognition by Shape Analysis of Motion Trajectories on Riemannian Manifold [J].

Devanne, Maxime ;

Wannous, Hazem ;

Berretti, Stefano ;

Pala, Pietro ;

Daoudi, Mohamed ;

Del Bimbo, Alberto .

IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1340-1352

[28]

Devanne M, 2013, LECT NOTES COMPUT SC, V8158, P456, DOI 10.1007/978-3-642-41190-8_49

[29]

Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878

[30]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

← 1 2 3 4 5 6 7 8 9 10 →