A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition

被引：33

作者：

Yu, Jiahui ^{[1
,2
]}

Gao, Hongwei ^{[1
]}

Yang, Wei ^{[1
]}

Jiang, Yueqiu ^{[1
]}

Chin, Weihong ^{[3
]}

Kubota, Naoyuki ^{[3
]}

Ju, Zhaojie ^{[2
]}

机构：

[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China

[2] Univ Portsmouth, Sch Comp, Portsmouth PO1 3HE, Hants, England

[3] Tokyo Metropolitan Univ, Grad Sch Syst Design, Tokyo 1910065, Japan

来源：

IEEE ACCESS | 2020年 / 8卷

基金：

中国国家自然科学基金;

关键词：

Feature extraction; Real-time systems; Spatiotemporal phenomena; Streaming media; Skeleton; Dynamics; Hidden Markov models; Human action recognition; RGB-D; attention mode; real-time feature fusion; dataset; TRACKING; SYSTEM;

D O I：

10.1109/ACCESS.2020.2977856

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Activity recognition which aims to accurately distinguish human actions in complex environments plays a key role in human-robot/computer interaction. However, long-lasting and similar actions will cause poor feature sequence extraction and thus lead to a reduction of the recognition accuracy. We propose a novel discriminative deep model (D3D-LSTM) based on 3D-CNN and LSTM for both single-target and interaction action recognition to improve the spatiotemporal processing performance. Our models have several notable properties: 1) A real-time feature fusion method is used to obtain a more representative feature sequence through composition of local mixtures for enhancing the performance of discriminating similar actions; 2) We introduce an improved attention mechanism that focuses on each frame individually by assigning different weights in real-time; 3) An alternating optimization strategy is proposed for our model to obtain parameters with the best performance. Because the proposed D3D-LSTM model is efficient enough to be used as a detector that recognizes various activities, a Real-set database is collected to evaluate action recognition in complex real-world scenarios. For long-term relations, we update the present memory state via the weight-controlled attention module that enables the memory cell to store better long-term features. The densely connected bimodal modal makes local perceptrons of 3D-Conv motion-aware and stores better short-term features. The proposed D3D-LSTM model has been evaluated through a series of experiments on the Real-set and open-source datasets, i.e. SBU-Kinect and MSR-action-3D. Experimental results show that the proposed D3D-LSTM model achieves new state-of-the-art results, including pushing the average rate of the SBU-Kinect to 92.40% and the average rate of the MSR-action-3D to 95.40%.

引用

页码：43243 / 43255

页数：13

共 72 条

[41]

Liu BL, 2016, IEEE SYS MAN CYBERN, P332, DOI 10.1109/SMC.2016.7844262

[42] Spatio-temporal feature extraction and representation for RGB-D human action recognition [J].

Luo, Jiajia ;

Wang, Wei ;

Qi, Hairong .

PATTERN RECOGNITION LETTERS, 2014, 50 :139-148

[43] A Bayesian computer vision system for modeling human interactions [J].

Oliver, NM ;

Rosario, B ;

Pentland, AP .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :831-843

[44] Learning a Deep Model for Human Action Recognition from Novel Viewpoints [J].

Rahmani, Hossein ;

Mian, Ajmal ;

Shah, Mubarak .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (03) :667-681

[45] 3D Action Recognition from Novel Viewpoints [J].

Rahmani, Hossein ;

Mian, Ajmal .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1506-1515

[46] Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences [J].

Shi, Zhiyuan ;

Kim, Tae-Kyun .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4684-4693

[47]

Simonyan K, 2014, ADV NEUR IN, V27

[48]

Simonyan K, 2015, Arxiv, DOI [arXiv:1409.1556, DOI 10.48550/ARXIV.1409.1556]

[49] Human interaction recognition based on the co-occurence of visual words [J].

Slimani, K. Nour el Houda ;

Benezeth, Yannick ;

Souami, Feriel .

2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, :461-+

[50]

Song SJ, 2017, AAAI CONF ARTIF INTE, P4263

← 1 2 3 4 5 6 7 8 →