Global-Local Feature Attention Network with Reranking Strategy for Image Caption Generation

被引：2

作者：

Wu, Jie ^{[1
]}

Xie, Siya ^{[1
]}

Shi, Xinbao ^{[1
]}

Chen, Yaowen ^{[2
]}

机构：

[1] Shantou Univ, Coll Engn, 243 Daxue Rd, Shantou, Peoples R China

[2] Shantou Univ, Key Lab Digital Signal & Image Proc Guangdong, 243 Daxue Rd, Shantou, Peoples R China

来源：

COMPUTER VISION, PT I | 2017年 / 771卷

关键词：

Image caption; Global-local feature attention network; Reranking strategy; Candidate captions;

D O I：

10.1007/978-981-10-7299-4_13

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, a novel framework, named global-local feature attention network with reranking strategy (GLAN-RS), is presented for image captioning task. Rather than only adopt unitary visual information in the classical models, GLAN-RS explore attention mechanism to capture local convolutional salient image maps. Furthermore, we adopt reranking strategy to adjust the priority of the candidate captions and select the best one. The proposed model is verified using the MSCOCO benchmark dataset across seven standard evaluation metrics. Experimental results show that GLAN-RS significantly outperforms the state-of-the-art approaches such as M-RNN, Google NIC etc., which gets an improvement of 20% in terms of BLEU4 score and 13 points in terms of CIDER score.

引用

页码：157 / 167

页数：11