Topic-Based Image Caption Generation

被引：14

作者：

Dash, Sandeep Kumar ^{[1
]}

Acharya, Shantanu ^{[1
]}

Pakray, Partha ^{[2
]}

Das, Ranjita ^{[1
]}

Gelbukh, Alexander ^{[3
]}

机构：

[1] NIT Mizoram, Dept CSE, Aizawl, India

[2] NIT Silchar, Dept CSE, Silchar, India

[3] IPN, CIC, Mexico City, DF, Mexico

来源：

ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING | 2020年 / 45卷 / 04期

关键词：

Image caption generation; Deep learning; Topic modelling; LATENT; SCENE;

D O I：

10.1007/s13369-019-04262-2

中图分类号：

O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];

学科分类号：

07 ; 0710 ; 09 ;

摘要：

Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder-decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.

引用

页码：3025 / 3034

页数：10

共 48 条

[1] [Anonymous], P 1 INT C REC TRENDS
[2] [Anonymous], P ICLR
[3] [Anonymous], 2017, P IEEE C COMP VIS PA
[4] [Anonymous], P CLEF
[5] [Anonymous], P NEUROCOMPUTING
[6] [Anonymous], P CVPR
[7] Bird S, 2009, Natural Language Processing with Python
[8] Latent Dirichlet allocation
Blei, DM
Ng, AY
Jordan, MI
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
[9] Generating image captions through multimodal embedding
Dash, Sandeep Kumar
Saha, Saurav
Pakray, Partha
Gelbukh, Alexander
[J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4787 - 4796
[10] DavidMBlei Michael I, 2003, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, P127

← 1 2 3 4 5 →