Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data

被引：15

作者：

Han, Yun ^{[1
]}

Chung, Sheng-Luen ^{[2
]}

Xiao, Qiang ^{[3
]}

Lin, Wei You ^{[2
]}

Su, Shun-Feng ^{[2
]}

机构：

[1] Neijiang Normal Univ, Sch Comp Sci, Neijiang 641100, Peoples R China

[2] Natl Taiwan Univ Sci & Technol, Dept Elect Engn, Taipei 10607, Taiwan

[3] Neijiang Normal Univ, Sch Foreign Languages, Neijiang 641100, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Human action recognition; global attention model; accumulative learning curve; LSTM; spatio-temporal attention; VISUAL-ATTENTION; NEURAL-NETWORK; LSTM;

D O I：

10.1109/ACCESS.2020.2992740

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The human skeleton joints captured by RGB-D camera are widely used in action recognition for its robust and comprehensive 3D information. Presently, most action recognition methods based on skeleton joints treat all skeletal joints with the same importance spatially and temporally. However, the contributions of skeletal joints vary significantly. Hence, a GL-LSTM & x002B;Diff model is proposed to improve the recognition of human actions. A global spatial attention (GSA) model is proposed to express the different weights for different skeletal joints to provide precise spatial information for human action recognition. The accumulative learning curve (ALC) model is introduced to highlight which frames contribute most to the final decision making by giving varying temporal weights to each intermediate accumulated learning results. By integrating the proposed GSA (for spatial information) and ALC (for temporal processing) models into the LSTM framework and taking the human skeletal joints as inputs, a global spatio-temporal action recognition framework (GL-LSTM) is constructed to recognize human actions. Diff is introduced as the preprocessing method to enhance the dynamic of the features, thus to get distinguishable features in deep learning. Rigorous experiments on the largest dataset NTU RGB & x002B;D and the common small dataset SBU show that the algorithm proposed in this paper outperforms other state-of-the-art methods.

引用

页码：88604 / 88616

页数：13

共 52 条

[1] Human activity recognition from 3D data: A review [J].

Aggarwal, J. K. ;

Xia, Lu .

PATTERN RECOGNITION LETTERS, 2014, 48 :70-80

[2] Identification of haploid and diploid maize seeds using convolutional neural networks and a transfer learning approach [J].

Altuntas, Yahya ;

Comert, Zafer ;

Kocamaz, Adnan Fatih .

COMPUTERS AND ELECTRONICS IN AGRICULTURE, 2019, 163

[3]

[Anonymous], 2018, P 24 INT C PATT REC

[4]

[Anonymous], 2018, P C EMP METH NAT LAN

[5]

[Anonymous], P INT C LEARN REPR I

[6]

Baradel F., 2017, Pose-conditioned Spatio-Temporal Attention for Human Action Recognition

[7] PERFORMANCE OF OPTICAL-FLOW TECHNIQUES [J].

BARRON, JL ;

FLEET, DJ ;

BEAUCHEMIN, SS .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 1994, 12 (01) :43-77

[8]

Chan W, 2016, INT CONF ACOUST SPEE, P4960, DOI 10.1109/ICASSP.2016.7472621

[9]

Dey R, 2017, MIDWEST SYMP CIRCUIT, P1597, DOI 10.1109/MWSCAS.2017.8053243

[10] Skeletal Quads: Human Action Recognition Using Joint Quadruples [J].

Evangelidis, Georgios ;

Singh, Gurkirt ;

Horaud, Radu .

2014 22ND INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2014, :4513-4518

← 1 2 3 4 5 6 →