Attention-Based Bidirectional Recurrent Neural Networks for Description Generation of Videos

被引：0

作者：

Du, Xiaotong ^{[1
]}

Yuan, Jiabin ^{[1
]}

Liu, Hu ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China

来源：

CLOUD COMPUTING AND SECURITY, PT VI | 2018年 / 11068卷

关键词：

Video description; Convolutional Neural Networks; Bidirectional Recurrent Neural Networks; Attention mechanism;

D O I：

10.1007/978-3-030-00021-9_40

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Describing videos in human language is of vital importance in many applications, such as managing massive videos on line and providing descriptive video service (DVS) for blind people. In order to further promote existing video description frameworks, this paper presents an end-to-end deep learning model incorporating Convolutional Neural Networks (CNNs) and Bidirectional Recurrent Neural Networks (BiRNNs) based on a multimodal attention mechanism. Firstly, the model produces richer video representations, including image feature, motion feature and audio feature, than other similar researches. Secondly, BiRNNs model encodes these features in both forward and backward directions. Finally, an attention-based decoder translates sequential outputs of encoder to sequential words. The model is evaluated on Microsoft Research Video Description Corpus (MSVD) dataset. The results demonstrate the necessity of combining BiRNNs with a multimodal attention mechanism and the superiority of this model over other state-of-the-art methods conducted on this dataset.

引用

页码：440 / 451

页数：12

共 28 条

[1] Convolutional Neural Networks for Speech Recognition
Abdel-Hamid, Ossama
Mohamed, Abdel-Rahman
Jiang, Hui
Deng, Li
Penn, Gerald
Yu, Dong
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (10) : 1533 - 1545
[2] [Anonymous], P 28 C UNC ART INT
[3] [Anonymous], 2012, CoRR
[4] [Anonymous], 2016, P 24 ACM INT C MULTI, DOI DOI 10.1145/2964284.2984066
[5] [Anonymous], COMPUTER SCI
[6] [Anonymous], 2016, CVPR, DOI DOI 10.1109/CVPR.2016.496
[7] [Anonymous], 2015, P IEEE INT C COMP VI
[8] Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[9] BidirectionalLong-Short Term Memory for Video Description
Bin, Yi
Yang, Yang
Shen, Fumin
Xu, Xing
Shen, Heng Tao
[J]. MM'16: PROCEEDINGS OF THE 2016 ACM MULTIMEDIA CONFERENCE, 2016, : 436 - 440
[10] Chen X, 2015, PROC CVPR IEEE, P2422, DOI 10.1109/CVPR.2015.7298856

← 1 2 3 →