DTA:Double LSTM with Temporal-wise Attention Network for Action Recognition

被引:0
作者
Xu, Yangyang [1 ,2 ]
Wang, Lei [2 ,3 ]
Cheng, Jun [2 ,3 ]
Xia, Haiying [1 ]
Yin, Jianqin [4 ]
机构
[1] Guangxi Normal Univ, Guilin, Peoples R China
[2] Chinese Acad Sci, Shenzhen Inst Adv Technol, Shenzhen Key Lab Virtual Real & Human Interact Te, Shenzhen, Peoples R China
[3] Chinese Univ Hong Kong, Hong Kong, Hong Kong, Peoples R China
[4] Beijing Univ Posts & Telecommun, Sch Automat, Beijing, Peoples R China
来源
PROCEEDINGS OF 2017 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER AND COMMUNICATIONS (ICCC) | 2017年
基金
中国国家自然科学基金;
关键词
Action Recognition; CNN; LSTM; Attention Model;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
In this paper, we propose a new architecture for human action recognition by using a convolution neural networks (CNN) and two Long Short-Term Memory(LSTM) networks with temporal-wise attention model. We call this network the Double LSTM with Temporal-wise Attention network (DTA). The features extracted by our model are both spatially and temporally. The attention model can learn which parts in which frames in a video are relevant to the video label and pay more attention on them. We designed a joint optimization layer (JOL) to jointly process two kinds of feature produced by two LSTMs. The proposed networks achieved improved performance on three widely used datasets-the UCF Sports dataset, the UCF11 dataset and the HMDB51 dataset.
引用
收藏
页码:1676 / 1680
页数:5
相关论文
共 15 条
[1]  
[Anonymous], 2014, ADV NEURAL INFORM PR
[2]  
[Anonymous], 1997, Neural Computation
[3]  
[Anonymous], IEEE WINT C APPL COM
[4]  
[Anonymous], 2014, ADV COMPUT VIS PATTE, DOI 10.1007/978-3-319-09396-3_9
[5]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893
[6]   Long-Term Recurrent Convolutional Networks for Visual Recognition and Description [J].
Donahue, Jeff ;
Hendricks, Lisa Anne ;
Rohrbach, Marcus ;
Venugopalan, Subhashini ;
Guadarrama, Sergio ;
Saenko, Kate ;
Darrell, Trevor .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) :677-691
[7]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941
[8]  
Heng Wang, 2011, 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), P3169, DOI 10.1109/CVPR.2011.5995407
[9]  
Kuehne H, 2013, HIGH PERFORMANCE COMPUTING IN SCIENCE AND ENGINEERING '12: TRANSACTIONS OF THE HIGH PERFORMANCE COMPUTING CENTER, STUTTGART (HLRS) 2012, P571, DOI 10.1007/978-3-642-33374-3_41
[10]  
Ng JYH, 2015, PROC CVPR IEEE, P4694, DOI 10.1109/CVPR.2015.7299101