Repeated review based image captioning for image evidence review

被引：14

作者：

Guan, Jinning ^{[1
]}

Wang, Eric ^{[1
]}

机构：

[1] Harbin Inst Technol, Shenzhen Grad Sch, Shenzhen Key Lab Internet Informat Collaborat, Shenzhen 518055, Peoples R China

来源：

SIGNAL PROCESSING-IMAGE COMMUNICATION | 2018年 / 63卷

基金：

中国国家自然科学基金;

关键词：

Repeated review; Image captioning; Encoder-decoder; Multimodal;

D O I：

10.1016/j.image.2018.02.005

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We propose a repeated review deep learning model for image captioning in image evidence review process. It consists of two subnetworks. One is the convolutional neural network which is employed to extract the image features and the other is the recurrent neural network which is used to decode the image features into captions. Our model combines the advantages of the two subnetworks by recalling visual information different from the traditional model of encoder-decoder, and then introduces multimodal layer to fuse the image and caption effectively. The proposed model has been validated on benchmark datasets (MSCOCO, Flick). It shows that the proposed model performs well on bleu-3 and bleu-4, even to some extent, beyond the best models available today (such as NIC, m-RNN, etc.).

引用

页码：141 / 148

页数：8