Generating Natural Video Descriptions using Semantic Gate

被引：0

作者：

Lee, Hyungmin ^{[1
]}

Kim, Il-Koo ^{[1
]}

机构：

[1] Samsung Elect, Samsung Res, Seoul, South Korea

来源：

2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年

关键词：

video captioning; semantic gate; LSTM;

D O I：

10.1109/ijcnn.2019.8851892

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Video captioning task aims to generate a textual description of the situation in a video. It is challenging because of the nature of modality-difference between video and language. We present a novel method to bridge the gap between them by utilizing the semantic gate in two ways. First, we develop an activation mechanism to make a video description that captures the concept of the video. Next, we design a network that evaluates the similarity between visual and sentence feature. Semantic gate is used to transform sentence into a semantic embedding. We also conduct experiments to show that image and action classification task performance is transferred to video captioning task. Experimental results show that our proposed method has gained promising improvements compared to the baseline model. Consequently, our model demonstrated the effectiveness by achieving new best record on MSRVTT and MSVD dataset.

引用

页数：7