Image captioning via semantic element embedding

被引:20
作者
Zhang, Xiaodan [1 ,2 ]
He, Shengfeng [3 ]
Song, Xinhang [1 ]
Lau, Rynson W. H. [2 ]
Jiao, Jianbin [1 ]
Ye, Qixiang [1 ]
机构
[1] Univ Chinese Acad Sci, Beijing, Peoples R China
[2] City Univ Hong Kong, Hong Kong, Peoples R China
[3] South China Univ Technol, Guangzhou, Peoples R China
基金
中国国家自然科学基金;
关键词
Image captioning; Element embedding; CNN; LSTM;
D O I
10.1016/j.neucom.2018.02.112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Image caption approaches that use the global Convolutional Neural Network (CNN) features are not able to represent and describe all the important elements in complex scenes. In this paper, we propose to enrich the semantic representations of images and update the language model by proposing semantic element embedding. For the semantic element discovery, an object detection module is used to predict regions of the image, and a captioning model, Long Short-Term Memory (LSTM), is employed to generate local descriptions for these regions. The predicted descriptions and categories are used to generate the semantic feature, which not only contains detailed information but also shares a word space with descriptions, and thus bridges the modality gap between visual images and semantic captions. We further integrate the CNN feature with the semantic feature into the proposed Element Embedding LSTM (EE-LSTM) model to predict a language description. Experiments on MS COCO datasets demonstrate that the proposed approach outperforms conventional caption methods and is flexible to combine with baseline models to achieve superior performance. (C) 2019 Published by Elsevier B.V.
引用
收藏
页码:212 / 221
页数:10
相关论文
共 48 条
[1]  
[Anonymous], P ICML
[2]  
[Anonymous], 2014, T ASSOC COMPUT LING
[3]  
[Anonymous], P ICLR
[4]  
[Anonymous], 2004, P ACL 04 WORKSH TEXT
[5]  
[Anonymous], 2012, COURSERA NEURAL NETW
[6]  
[Anonymous], 2010, P LREC 2010 WORKSHOP
[7]  
[Anonymous], 2013, P 51 ANN M ASS COMP
[8]  
[Anonymous], P BIGL NIPS WORKSH
[9]  
[Anonymous], P ICCV
[10]  
Bahdanau D., 2014, ABS14090473 CORR