Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

被引:27
作者
Agethen, Sebastian [1 ]
Hsu, Winston H. [1 ]
机构
[1] Natl Taiwan Univ, Taipei 10617, Taiwan
关键词
Kernel; Videos; Task analysis; Convolution; Feature extraction; YouTube; Mathematical model; Computational and artificial intelligence; neural networks; feedforward neural networks; recurrent neural networks; ACTION RECOGNITION; FUSION;
D O I
10.1109/TMM.2019.2932564
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.
引用
收藏
页码:819 / 829
页数:11
相关论文
共 39 条
[1]  
[Anonymous], 2015, P 2015 ANN C N AM CH
[2]  
[Anonymous], 2008, BR MACH VIS CONF
[3]  
[Anonymous], 2016, Videolstm convolves, attends and flows for action recognition
[4]  
[Anonymous], 2016, P C ASS MACH TRANSL
[5]  
[Anonymous], ARXIV180103150
[6]  
[Anonymous], 2018, BMVC
[7]  
[Anonymous], 2012, CoRR
[8]  
Bregonzio M, 2009, PROC CVPR IEEE, P1948, DOI 10.1109/CVPRW.2009.5206779
[9]   Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].
Carreira, Joao ;
Zisserman, Andrew .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733
[10]   Histograms of oriented gradients for human detection [J].
Dalal, N ;
Triggs, B .
2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893