Generating Description with Multi-feature Fusion and Saliency Maps of Image

被引：0

作者：

Liu, Lisha ^{[1
]}

Ding, Yuxuan ^{[1
]}

Tian, Chunna ^{[1
]}

Yuan, Bo ^{[1
]}

机构：

[1] Xidian Univ, Sch Elect Engn, Video & Image Proc Syst VIPS Lab, Xian, Shaanxi, Peoples R China

来源：

NINTH INTERNATIONAL CONFERENCE ON GRAPHIC AND IMAGE PROCESSING (ICGIP 2017) | 2018年 / 10615卷

关键词：

Significant object detection; multi-feature; LSTM (long short term memory); normalization;

D O I：

10.1117/12.2304845

中图分类号：

O43 [光学];

学科分类号：

070207 ; 0803 ;

摘要：

Generating description for an image can be regard as visual understanding. It is across artificial intelligence, machine learning, natural language processing and many other areas. In this paper, we present a model that generates description for images based on RNN (recurrent neural network) with object attention and multi-feature of images. The deep recurrent neural networks have excellent performance in machine translation, so we use it to generate natural sentence description for images. The proposed method uses single CNN (convolution neural network) that is trained on ImageNet to extract image features. But we think it can not adequately contain the content in images, it may only focus on the object area of image. So we add scene information to image feature using CNN which is trained on Places205. Experiments show that model with multi-feature extracted by two CNNs perform better than which with a single feature. In addition, we make saliency weights on images to emphasize the salient objects in images. We evaluate our model on MSCOCO based on public metrics, and the results show that our model performs better than several state-of-the-art methods.

引用

页数：8

共 16 条

[1]

[Anonymous], 2016, PROC CVPR IEEE, DOI [DOI 10.1109/CVPR.2016.319, 10.1109/CVPR.2016.319]

[2]

[Anonymous], 2015, DEEP VISUAL SEMANTIC

[3]

[Anonymous], MULTIMODAL NEURAL LA

[4]

Cho K., 2014, ARXIV14061078, P1724, DOI 10.3115/V1/D14-1179

[5]

Girshick R., 2014, P IEEE C COMP VIS PA, DOI [10.1109/CVPR.2014.81, DOI 10.1109/CVPR.2014.81, 10.1109/cvpr.2014.81]

[6]

Hochreiter S., 1997, LONG SHORT TERM MEMO

[7] Salient Object Detection: A Discriminative Regional Feature Integration Approach [J].

Jiang, Huaizu ;

Wang, Jingdong ;

Yuan, Zejian ;

Wu, Yang ;

Zheng, Nanning ;

Li, Shipeng .

2013 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2013, :2083-2090

[8]

KIROS R., 2015, UNIFYING VISUALSEMAN

[9]

Lin Tsung-Yi, ARXIV14050312

[10]

Lu C., 2016, Visual relationship detection with language priors

← 1 2 →