Hybrid features for skeleton-based action recognition based on network fusion

被引：4

作者：

Chen, Zhangmeng ^{[1
,2
]}

Pan, Junjun ^{[1
,2
]}

Yang, Xiaosong ^{[3
]}

Qin, Hong ^{[4
]}

机构：

[1] Beihang Univ, State Key Lab Virtual Real Technol & Syst, Beijing, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Peoples R China

[3] Bournemouth Univ, Fac Media & Commun, Poole, Dorset, England

[4] SUNY Stony Brook, Dept Comp Sci, Stony Brook, NY 11794 USA

来源：

COMPUTER ANIMATION AND VIRTUAL WORLDS | 2020年 / 31卷 / 4-5期

基金：

美国国家科学基金会; 中国国家自然科学基金; 北京市自然科学基金; 国家重点研发计划;

关键词：

action recognition; CNN; human skeleton; hybrid features; LSTM; multistream neural network;

D O I：

10.1002/cav.1952

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

In recent years, the topic of skeleton-based human action recognition has attracted significant attention from researchers and practitioners in graphics, vision, animation, and virtual environments. The most fundamental issue is how to learn an effective and accurate representation from spatiotemporal action sequences towards improved performance, and this article aims to address the aforementioned challenge. In particular, we design a novel method of hybrid features' extraction based on the construction of multistream networks and their organic fusion. First, we train a convolution neural networks (CNN) model to learn CNN-based features with the raw skeleton coordinates and their temporal differences serving as input signals. The attention mechanism is injected into the CNN model to weigh more effective and important information. Then, we employ long short-term memory (LSTM) to obtain long-term temporal features from action sequences. Finally, we generate the hybrid features by fusing the CNN and LSTM networks, and we classify action types with the hybrid features. The extensive experiments are performed on several large-scale publically available databases, and promising results demonstrate the efficacy and effectiveness of our proposed framework.

引用

页数：11

共 30 条

[1] [Anonymous], 2014, ADV NEURAL INFORM PR
[2] [Anonymous], 2018, P 32 AAAI C ART INT
[3] Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields
Cao, Zhe
Simon, Tomas
Wei, Shih-En
Sheikh, Yaser
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1302 - 1310
[4] Dynamic relationship between tourism, economic growth, and environmental quality
Danish
Wang, Zhaohua
[J]. JOURNAL OF SUSTAINABLE TOURISM, 2018, 26 (11) : 1928 - 1943
[5] Du Y, 2015, PROC CVPR IEEE, P1110, DOI 10.1109/CVPR.2015.7298714
[6] A Deep Learning Framework for Character Motion Synthesis and Editing
Holden, Daniel
Saito, Jun
Komura, Taku
[J]. ACM TRANSACTIONS ON GRAPHICS, 2016, 35 (04):
[7] SKELETON-BASED ACTION RECOGNITION WITH SYNCHRONOUS LOCAL AND NON-LOCAL SPATIO-TEMPORAL LEARNING AND FREQUENCY ATTENTION
Hu, Guyue
Cui, Bo
Yu, Shan
[J]. 2019 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME), 2019, : 1216 - 1221
[8] Hu JF, 2015, PROC CVPR IEEE, P5344, DOI 10.1109/CVPR.2015.7299172
[9] Hussein M.E., 2013, P 23 INT JOINT C ART
[10] Kay W., 2017, arXiv

← 1 2 3 →