Topic-Based Image Caption Generation

被引:14
作者
Dash, Sandeep Kumar [1 ]
Acharya, Shantanu [1 ]
Pakray, Partha [2 ]
Das, Ranjita [1 ]
Gelbukh, Alexander [3 ]
机构
[1] NIT Mizoram, Dept CSE, Aizawl, India
[2] NIT Silchar, Dept CSE, Silchar, India
[3] IPN, CIC, Mexico City, DF, Mexico
关键词
Image caption generation; Deep learning; Topic modelling; LATENT; SCENE;
D O I
10.1007/s13369-019-04262-2
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Image captioning is to generate captions for a given image based on the content of the image. To describe an image efficiently, it requires extracting as much information from it as possible. Apart from detecting the presence of objects and their relative orientation, the respective purpose intending the topic of the image is another vital information which can be incorporated with the model to improve the efficiency of the caption generation system. The sole aim is to put extra thrust on the context of the image imitating human approach, as the mere presence of objects which may not be related to the context representing the image should not be a part of the generated caption. In this work, the focus is on detecting the topic concerning the image so as to guide a novel deep learning-based encoder-decoder framework to generate captions for the image. The method is compared with some of the earlier state-of-the-art models based on the result obtained from MSCOCO 2017 training data set. BLEU, CIDEr, ROGUE-L, METEOR scores are used to measure the efficacy of the model which show improvement in performance of the caption generation process.
引用
收藏
页码:3025 / 3034
页数:10
相关论文
共 48 条
  • [1] [Anonymous], P 1 INT C REC TRENDS
  • [2] [Anonymous], P ICLR
  • [3] [Anonymous], 2017, P IEEE C COMP VIS PA
  • [4] [Anonymous], P CLEF
  • [5] [Anonymous], P NEUROCOMPUTING
  • [6] [Anonymous], P CVPR
  • [7] Bird S, 2009, Natural Language Processing with Python
  • [8] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [9] Generating image captions through multimodal embedding
    Dash, Sandeep Kumar
    Saha, Saurav
    Pakray, Partha
    Gelbukh, Alexander
    [J]. JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2019, 36 (05) : 4787 - 4796
  • [10] DavidMBlei Michael I, 2003, Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, P127