Global Spatio-Temporal Attention for Action Recognition Based on 3D Human Skeleton Data

被引：15

作者：

Han, Yun ^{[1
]}

Chung, Sheng-Luen ^{[2
]}

Xiao, Qiang ^{[3
]}

Lin, Wei You ^{[2
]}

Su, Shun-Feng ^{[2
]}

机构：

[1] Neijiang Normal Univ, Sch Comp Sci, Neijiang 641100, Peoples R China

[2] Natl Taiwan Univ Sci & Technol, Dept Elect Engn, Taipei 10607, Taiwan

[3] Neijiang Normal Univ, Sch Foreign Languages, Neijiang 641100, Peoples R China

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Human action recognition; global attention model; accumulative learning curve; LSTM; spatio-temporal attention; VISUAL-ATTENTION; NEURAL-NETWORK; LSTM;

D O I：

10.1109/ACCESS.2020.2992740

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The human skeleton joints captured by RGB-D camera are widely used in action recognition for its robust and comprehensive 3D information. Presently, most action recognition methods based on skeleton joints treat all skeletal joints with the same importance spatially and temporally. However, the contributions of skeletal joints vary significantly. Hence, a GL-LSTM & x002B;Diff model is proposed to improve the recognition of human actions. A global spatial attention (GSA) model is proposed to express the different weights for different skeletal joints to provide precise spatial information for human action recognition. The accumulative learning curve (ALC) model is introduced to highlight which frames contribute most to the final decision making by giving varying temporal weights to each intermediate accumulated learning results. By integrating the proposed GSA (for spatial information) and ALC (for temporal processing) models into the LSTM framework and taking the human skeletal joints as inputs, a global spatio-temporal action recognition framework (GL-LSTM) is constructed to recognize human actions. Diff is introduced as the preprocessing method to enhance the dynamic of the features, thus to get distinguishable features in deep learning. Rigorous experiments on the largest dataset NTU RGB & x002B;D and the common small dataset SBU show that the algorithm proposed in this paper outperforms other state-of-the-art methods.

引用

页码：88604 / 88616

页数：13

共 52 条

[11] Attention-Based Multiview Re-Observation Fusion Network for Skeletal Action Recognition [J].

Fan, Zhaoxuan ;

Zhao, Xu ;

Lin, Tianwei ;

Su, Haisheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) :363-374

[12] Framewise phoneme classification with bidirectional LSTM and other neural network architectures [J].

Graves, A ;

Schmidhuber, J .

NEURAL NETWORKS, 2005, 18 (5-6) :602-610

[13]

Han Y., 2018, Proceedings of 2018 International Joint Conference on Neural Networks, P1

[14] Two-stream LSTM for Action Recognition with RGB-D-based hand-crafted features and feature combination [J].

Han, Yun ;

Chung, Sheng-Luen ;

Chen, Sheng-Fang ;

Su, Shun Feng .

2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, :3547-3552

[15] Jointly Learning Heterogeneous Features for RGB-D Activity Recognition [J].

Hu, Jian-Fang ;

Zheng, Wei-Shi ;

Lai, Jianhuang ;

Zhang, Jianguo .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (11) :2186-2200

[16] A model of saliency-based visual attention for rapid scene analysis [J].

Itti, L ;

Koch, C ;

Niebur, E .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1998, 20 (11) :1254-1259

[17] A New Representation of Skeleton Sequences for 3D Action Recognition [J].

Ke, Qiuhong ;

Bennamoun, Mohammed ;

An, Senjian ;

Sohel, Ferdous ;

Boussaid, Farid .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4570-4579

[18] SkeletonNet: Mining Deep Part Features for 3-D Action Recognition [J].

Ke, Qiuhong ;

An, Senjian ;

Bennamoun, Mohammed ;

Sohel, Ferdous ;

Boussaid, Farid .

IEEE SIGNAL PROCESSING LETTERS, 2017, 24 (06) :731-735

[19]

Kingma D., 2015, P 3 INT C LEARN REPR, P512

[20] Ensemble Deep Learning for Skeleton-based Action Recognition using Temporal Sliding LSTM networks [J].

Lee, Inwoong ;

Kim, Doyoung ;

Kang, Seoungyoon ;

Lee, Sanghoon .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1012-1020

← 1 2 3 4 5 6 →