Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

被引：27

作者：

Agethen, Sebastian ^{[1
]}

Hsu, Winston H. ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei 10617, Taiwan

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 03期

关键词：

Kernel; Videos; Task analysis; Convolution; Feature extraction; YouTube; Mathematical model; Computational and artificial intelligence; neural networks; feedforward neural networks; recurrent neural networks; ACTION RECOGNITION; FUSION;

D O I：

10.1109/TMM.2019.2932564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.

引用

页码：819 / 829

页数：11

共 39 条

[1]

[Anonymous], 2015, P 2015 ANN C N AM CH

[2]

[Anonymous], 2008, BR MACH VIS CONF

[3]

[Anonymous], 2016, Videolstm convolves, attends and flows for action recognition

[4]

[Anonymous], 2016, P C ASS MACH TRANSL

[5]

[Anonymous], ARXIV180103150

[6]

[Anonymous], 2018, BMVC

[7]

[Anonymous], 2012, CoRR

[8]

Bregonzio M, 2009, PROC CVPR IEEE, P1948, DOI 10.1109/CVPRW.2009.5206779

[9] Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [J].

Carreira, Joao ;

Zisserman, Andrew .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :4724-4733

[10] Histograms of oriented gradients for human detection [J].

Dalal, N ;

Triggs, B .

2005 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOL 1, PROCEEDINGS, 2005, :886-893

← 1 2 3 4 →