Generating Natural Video Descriptions using Semantic Gate

被引:0
作者
Lee, Hyungmin [1 ]
Kim, Il-Koo [1 ]
机构
[1] Samsung Elect, Samsung Res, Seoul, South Korea
来源
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2019年
关键词
video captioning; semantic gate; LSTM;
D O I
10.1109/ijcnn.2019.8851892
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Video captioning task aims to generate a textual description of the situation in a video. It is challenging because of the nature of modality-difference between video and language. We present a novel method to bridge the gap between them by utilizing the semantic gate in two ways. First, we develop an activation mechanism to make a video description that captures the concept of the video. Next, we design a network that evaluates the similarity between visual and sentence feature. Semantic gate is used to transform sentence into a semantic embedding. We also conduct experiments to show that image and action classification task performance is transferred to video captioning task. Experimental results show that our proposed method has gained promising improvements compared to the baseline model. Consequently, our model demonstrated the effectiveness by achieving new best record on MSRVTT and MSVD dataset.
引用
收藏
页数:7
相关论文
共 37 条
[1]  
[Anonymous], CVPR
[2]  
[Anonymous], 2014, COLING
[3]  
[Anonymous], ARXIV170506950
[4]  
[Anonymous], ARXIV160908144
[5]  
[Anonymous], 2016, PROC 24 ACM INT C MU, DOI [DOI 10.1145/2964284.2984065, 10.1145/2964284.2984065]
[6]  
[Anonymous], 2016, P 24 ACM INT C MULTI, DOI DOI 10.1145/2964284.2984066
[7]  
[Anonymous], 2017, P 31 AAAI C ART INT
[8]  
[Anonymous], 2017, NIPS
[9]  
[Anonymous], 2013, PRINCIPLES METHODOLO
[10]  
[Anonymous], CVPR