Global-Local Feature Attention Network with Reranking Strategy for Image Caption Generation

被引:2
作者
Wu, Jie [1 ]
Xie, Siya [1 ]
Shi, Xinbao [1 ]
Chen, Yaowen [2 ]
机构
[1] Shantou Univ, Coll Engn, 243 Daxue Rd, Shantou, Peoples R China
[2] Shantou Univ, Key Lab Digital Signal & Image Proc Guangdong, 243 Daxue Rd, Shantou, Peoples R China
来源
COMPUTER VISION, PT I | 2017年 / 771卷
关键词
Image caption; Global-local feature attention network; Reranking strategy; Candidate captions;
D O I
10.1007/978-981-10-7299-4_13
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, a novel framework, named global-local feature attention network with reranking strategy (GLAN-RS), is presented for image captioning task. Rather than only adopt unitary visual information in the classical models, GLAN-RS explore attention mechanism to capture local convolutional salient image maps. Furthermore, we adopt reranking strategy to adjust the priority of the candidate captions and select the best one. The proposed model is verified using the MSCOCO benchmark dataset across seven standard evaluation metrics. Experimental results show that GLAN-RS significantly outperforms the state-of-the-art approaches such as M-RNN, Google NIC etc., which gets an improvement of 20% in terms of BLEU4 score and 13 points in terms of CIDER score.
引用
收藏
页码:157 / 167
页数:11
相关论文
empty
未找到相关数据