A Discriminative Deep Model With Feature Fusion and Temporal Attention for Human Action Recognition

被引:33
作者
Yu, Jiahui [1 ,2 ]
Gao, Hongwei [1 ]
Yang, Wei [1 ]
Jiang, Yueqiu [1 ]
Chin, Weihong [3 ]
Kubota, Naoyuki [3 ]
Ju, Zhaojie [2 ]
机构
[1] Shenyang Ligong Univ, Sch Automat & Elect Engn, Shenyang 110159, Peoples R China
[2] Univ Portsmouth, Sch Comp, Portsmouth PO1 3HE, Hants, England
[3] Tokyo Metropolitan Univ, Grad Sch Syst Design, Tokyo 1910065, Japan
基金
中国国家自然科学基金;
关键词
Feature extraction; Real-time systems; Spatiotemporal phenomena; Streaming media; Skeleton; Dynamics; Hidden Markov models; Human action recognition; RGB-D; attention mode; real-time feature fusion; dataset; TRACKING; SYSTEM;
D O I
10.1109/ACCESS.2020.2977856
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Activity recognition which aims to accurately distinguish human actions in complex environments plays a key role in human-robot/computer interaction. However, long-lasting and similar actions will cause poor feature sequence extraction and thus lead to a reduction of the recognition accuracy. We propose a novel discriminative deep model (D3D-LSTM) based on 3D-CNN and LSTM for both single-target and interaction action recognition to improve the spatiotemporal processing performance. Our models have several notable properties: 1) A real-time feature fusion method is used to obtain a more representative feature sequence through composition of local mixtures for enhancing the performance of discriminating similar actions; 2) We introduce an improved attention mechanism that focuses on each frame individually by assigning different weights in real-time; 3) An alternating optimization strategy is proposed for our model to obtain parameters with the best performance. Because the proposed D3D-LSTM model is efficient enough to be used as a detector that recognizes various activities, a Real-set database is collected to evaluate action recognition in complex real-world scenarios. For long-term relations, we update the present memory state via the weight-controlled attention module that enables the memory cell to store better long-term features. The densely connected bimodal modal makes local perceptrons of 3D-Conv motion-aware and stores better short-term features. The proposed D3D-LSTM model has been evaluated through a series of experiments on the Real-set and open-source datasets, i.e. SBU-Kinect and MSR-action-3D. Experimental results show that the proposed D3D-LSTM model achieves new state-of-the-art results, including pushing the average rate of the SBU-Kinect to 92.40% and the average rate of the MSR-action-3D to 95.40%.
引用
收藏
页码:43243 / 43255
页数:13
相关论文
共 72 条
[41]  
Liu BL, 2016, IEEE SYS MAN CYBERN, P332, DOI 10.1109/SMC.2016.7844262
[42]   Spatio-temporal feature extraction and representation for RGB-D human action recognition [J].
Luo, Jiajia ;
Wang, Wei ;
Qi, Hairong .
PATTERN RECOGNITION LETTERS, 2014, 50 :139-148
[43]   A Bayesian computer vision system for modeling human interactions [J].
Oliver, NM ;
Rosario, B ;
Pentland, AP .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2000, 22 (08) :831-843
[44]   Learning a Deep Model for Human Action Recognition from Novel Viewpoints [J].
Rahmani, Hossein ;
Mian, Ajmal ;
Shah, Mubarak .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (03) :667-681
[45]   3D Action Recognition from Novel Viewpoints [J].
Rahmani, Hossein ;
Mian, Ajmal .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1506-1515
[46]   Learning and Refining of Privileged Information-based RNNs for Action Recognition from Depth Sequences [J].
Shi, Zhiyuan ;
Kim, Tae-Kyun .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4684-4693
[47]  
Simonyan K, 2014, ADV NEUR IN, V27
[48]  
Simonyan K, 2015, Arxiv, DOI [arXiv:1409.1556, DOI 10.48550/ARXIV.1409.1556]
[49]   Human interaction recognition based on the co-occurence of visual words [J].
Slimani, K. Nour el Houda ;
Benezeth, Yannick ;
Souami, Feriel .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2014, :461-+
[50]  
Song SJ, 2017, AAAI CONF ARTIF INTE, P4263