Human action recognition using two-stream attention based LSTM networks

被引:137
作者
Dai, Cheng [1 ]
Liu, Xingang [1 ]
Lai, Jinfeng [1 ]
机构
[1] Univ Elect Sci & Technol China, Sch Informat & Commun Engn, Chengdu, Sichuan, Peoples R China
基金
美国国家科学基金会;
关键词
Human action recognition; Visual attention mechanism; LSTM network; Deep feature correlation layer; NEURAL-NETWORKS; MACHINE; IMAGE;
D O I
10.1016/j.asoc.2019.105820
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is well known that different frames play different roles in feature learning in video based human action recognition task. However, most existing deep learning models put the same weights on different visual and temporal cues in the parameter training stage, which severely affects the feature distinction determination. To address this problem, this paper utilizes the visual attention mechanism and proposes an end-to-end two-stream attention based LSTM network. It can selectively focus on the effective features for the original input images and pay different levels of attentions to the outputs of each deep feature maps. Moreover, considering the correlation between two deep feature streams, a deep feature correlation layer is proposed to adjust the deep learning network parameter based on the correlation judgement. In the end, we evaluate our approach on three different datasets, and the experiments results show that our proposal can achieve the state-of-the-art performance in the common scenarios. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页数:8
相关论文
共 37 条
[1]   Human Activity Analysis: A Review [J].
Aggarwal, J. K. ;
Ryoo, M. S. .
ACM COMPUTING SURVEYS, 2011, 43 (03)
[2]  
[Anonymous], P 32 INT C MACH LEAR
[3]  
[Anonymous], P 33 INT C MACH LEAR
[4]  
[Anonymous], 2017, P 2017 IEEE WINTER C
[5]  
[Anonymous], 2014, P IEEE INT C COMP VI
[6]  
[Anonymous], ARXIV151203980
[7]   Human action recognition using a fast learning fully complex-valued classifier [J].
Babu, R. Venkatesh ;
Suresh, S. ;
Savitha, R. .
NEUROCOMPUTING, 2012, 89 :202-212
[8]   Human Behavior Deep Recognition Architecture for Smart City Applications in the 5G Environment [J].
Dai, Cheng ;
Liu, Xingang ;
Lai, Jinfeng ;
Li, Pan ;
Chao, Han-Chieh .
IEEE NETWORK, 2019, 33 (05) :206-211
[9]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[10]   A survey on deep learning techniques for image and video semantic segmentation [J].
Garcia-Garcia, Alberto ;
Orts-Escolano, Sergio ;
Oprea, Sergiu ;
Villena-Martinez, Victor ;
Martinez-Gonzalez, Pablo ;
Garcia-Rodriguez, Jose .
APPLIED SOFT COMPUTING, 2018, 70 :41-65