Repeated review based image captioning for image evidence review

被引:14
作者
Guan, Jinning [1 ]
Wang, Eric [1 ]
机构
[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen Key Lab Internet Informat Collaborat, Shenzhen 518055, Peoples R China
基金
中国国家自然科学基金;
关键词
Repeated review; Image captioning; Encoder-decoder; Multimodal;
D O I
10.1016/j.image.2018.02.005
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image features and the other is the recurrent neural network which is used to decode the image features into captions. Our model combines the advantages of the two subnetworks by recalling visual information different from the traditional model of encoder-decoder, and then introduces multimodal layer to fuse the image and caption effectively. The proposed model has been validated on benchmark datasets (MSCOCO, Flick). It shows that the proposed model performs well on bleu-3 and bleu-4, even to some extent, beyond the best models available today (such as NIC, m-RNN, etc.).
引用
收藏
页码:141 / 148
页数:8
相关论文
共 25 条
[1]  
[Anonymous], COMPUT SCI
[2]  
[Anonymous], 2010 WORKSH CREAT SP
[3]  
[Anonymous], 2014, T ASSOC COMPUT LING
[4]  
[Anonymous], ACL
[5]  
[Anonymous], PROC CVPR IEEE
[6]  
[Anonymous], 2016, Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
[7]  
[Anonymous], 1997, Neural Computation
[8]  
[Anonymous], COMPUT SCI
[9]  
[Anonymous], 2015, COMPUTER SCI
[10]  
[Anonymous], DEEP REINFORCEMENT L