Deep Multi-Kernel Convolutional LSTM Networks and an Attention-Based Mechanism for Videos

被引：27

作者：

Agethen, Sebastian ^{[1
]}

Hsu, Winston H. ^{[1
]}

机构：

[1] Natl Taiwan Univ, Taipei 10617, Taiwan

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2020年 / 22卷 / 03期

关键词：

Kernel; Videos; Task analysis; Convolution; Feature extraction; YouTube; Mathematical model; Computational and artificial intelligence; neural networks; feedforward neural networks; recurrent neural networks; ACTION RECOGNITION; FUSION;

D O I：

10.1109/TMM.2019.2932564

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Action recognition greatly benefits motion understanding in video analysis. Recurrent networks such as long short-term memory (LSTM) networks are a popular choice for motion-aware sequence learning tasks. Recently, a convolutional extension of LSTM was proposed, in which input-to-hidden and hidden-to-hidden transitions are modeled through convolution with a single kernel. This implies an unavoidable trade-off between effectiveness and efficiency. Herein, we propose a new enhancement to convolutional LSTM networks that supports accommodation of multiple convolutional kernels and layers. This resembles a Network-in-LSTM approach, which improves upon the aforementioned concern. In addition, we propose an attention-based mechanism that is specifically designed for our multi-kernel extension. We evaluated our proposed extensions in a supervised classification setting on the UCF-101 and Sports-1M datasets, with the findings showing that our enhancements improve accuracy. We also undertook qualitative analysis to reveal the characteristics of our system and the convolutional LSTM baseline.

引用

页码：819 / 829

页数：11

共 39 条

[31]

Srivastava N, 2015, PR MACH LEARN RES, V37, P843

[32]

Srivastava N, 2014, J MACH LEARN RES, V15, P1929

[33]

Sutskever I, 2014, ADV NEUR IN, V27

[34]

Szegedy C., 2015, P IEEE C COMP VIS PA, P1, DOI [10.1109/cvpr.2015.7298594, DOI 10.1109/CVPR.2015.7298594]

[35] Sequence to Sequence - Video to Text [J].

Venugopalan, Subhashini ;

Rohrbach, Marcus ;

Donahue, Jeff ;

Mooney, Raymond ;

Darrell, Trevor ;

Saenko, Kate .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4534-4542

[36] The Pose Knows: Video Forecasting by Generating Pose Futures [J].

Walker, Jacob ;

Marino, Kenneth ;

Gupta, Abhinav ;

Hebert, Martial .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :3352-3361

[37] Learning Attentional Recurrent Neural Network for Visual Tracking [J].

Wang, Qiurui ;

Yuan, Chun ;

Wang, Jingdong ;

Zeng, Wenjun .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (04) :930-942

[38] Two-Stream 3-D convNet Fusion for Action Recognition in Videos With Arbitrary Size and Length [J].

Wang, Xuanhan ;

Gao, Lianli ;

Wang, Peng ;

Sun, Xiaoshuai ;

Liu, Xianglong .

IEEE TRANSACTIONS ON MULTIMEDIA, 2018, 20 (03) :634-644

[39] Diversified Visual Attention Networks for Fine-Grained Object Classification [J].

Zhao, Bo ;

Wu, Xiao ;

Feng, Jiashi ;

Peng, Qiang ;

Yan, Shuicheng .

IEEE TRANSACTIONS ON MULTIMEDIA, 2017, 19 (06) :1245-1256

← 1 2 3 4 →