Action Recognition Scheme Based on Skeleton Representation With DS-LSTM Network

被引：55

作者：

Jiang, Xinghao ^{[1
]}

Xu, Ke ^{[1
]}

Sun, Tanfeng ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Sch Elect Informat & Elect Engn, Natl Engn Lab Informat Content Anal Technol, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2020年 / 30卷 / 07期

关键词：

Skeleton; Hidden Markov models; Three-dimensional displays; Noise reduction; Geometry; Robustness; Electrical engineering; Action recognition; skeleton; DS-LSTM; ST-STD; Lie group; STAE; FEATURES;

D O I：

10.1109/TCSVT.2019.2914137

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Skeleton-based human action recognition has been a popular research field during the past few years. With the help of cameras equipping deep sensors, such as the Kinect, human action can be represented by a sequence of human skeleton data. Inspired by the skeleton descriptors based on Lie group, a spatial-temporal skeleton transformation descriptor (ST-STD) is proposed in this paper. The ST-STD describes the relative transformations of skeletons, including the rotation and translation during movement. It gives a comprehensive view of the skeleton in both spatial and temporal domain for each frame. To capture the temporal connections in the skeleton sequence, a denoising sparse long short term memory (DS-LSTM) network is proposed in this paper. The DS-LSTM is designed to deal with two problems in action recognition. First, to decrease the intra-class diversity, the spatial-temporal auto-encoder (STAE) is proposed in this paper to generate representations with higher abstractness. The denoising constraint and the sparsity constraint are applied on both spatial and temporal domain to enhance the robustness and to reduce action misalignment. Second, to model the action sequence, a three-layer LSTM structure is trained with STAE representations for temporal modeling and classification. The experiments are carried out on four popular datasets. The results show that our approach performs better than several existing skeleton-based action recognition methods, which prove the effectiveness of our method.

引用

页码：2129 / 2140

页数：12

共 45 条

[1]

Anirudh R, 2015, PROC CVPR IEEE, P3147, DOI 10.1109/CVPR.2015.7298934

[2]

[Anonymous], 2008, P 25 INT C MACHINE L

[3]

[Anonymous], 2013, 23 INT JOINT C ART I, DOI DOI 10.5555/2540128.2540483

[4]

[Anonymous], 2012, 2012 IEEE COMP SOC C, DOI DOI 10.1109/CVPRW.2012.6239231

[5] Effective Active Skeleton Representation for Low Latency Human Action Recognition [J].

Cai, Xingyang ;

Zhou, Wengang ;

Wu, Lei ;

Luo, Jiebo ;

Li, Houqiang .

IEEE TRANSACTIONS ON MULTIMEDIA, 2016, 18 (02) :141-154

[6] Bio-inspired Dynamic 3D Discriminative Skeletal Features for Human Action Recognition [J].

Chaudhry, Rizwan ;

Ofli, Ferda ;

Kurillo, Gregorij ;

Bajcsy, Ruzena ;

Vidal, Rene .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2013, :471-478

[7] Learning a 3D Human Pose Distance Metric from Geometric Pose Descriptor [J].

Chen, Cheng ;

Zhuang, Yueting ;

Nie, Feiping ;

Yang, Yi ;

Wu, Fei ;

Xiao, Jun .

IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2011, 17 (11) :1676-1689

[8]

Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714

[9]

Gan C, 2015, PROC CVPR IEEE, P2568, DOI 10.1109/CVPR.2015.7298872

[10] CXCL12/CXCR4: a symbiotic bridge linking cancer cells and their stromal neighbors in oncogenic communication networks [J].

Guo, F. ;

Wang, Y. ;

Liu, J. ;

Mok, S. C. ;

Xue, F. ;

Zhang, W. .

ONCOGENE, 2016, 35 (07) :816-826

← 1 2 3 4 5 →