Image Captioning Encoder–Decoder Models Using CNN-RNN Architectures: A Comparative Study

被引:0
作者
K. Revati Suresh
Arun Jarapala
P. V. Sudeep
机构
[1] National Institute of Technology Calicut,Department of Electronics and Communication Engineering
来源
Circuits, Systems, and Signal Processing | 2022年 / 41卷
关键词
Deep learning; Image captioning; Natural language processing; Convolutional neural network; Recurrent neural network;
D O I
暂无
中图分类号
学科分类号
摘要
An image caption generator produces syntactically and semantically correct sentences to narrate the scene of a natural image. A neural image caption (NIC) generator is a popular deep learning model for automatically generating image captions in plain English. The NIC generator combines a convolutional neural network (CNN) encoder and a long short-term memory (LSTM) decoder. This paper investigates the performance of different CNN encoders and recurrent neural network decoders for finding the best NIC generator model for image captioning. Besides, we test the image caption generators with four image inject models and with decoding strategies such as greedy search and beam search. We conducted experiments on the Flickr8k dataset and analyzed the results qualitatively and quantitatively. Our results validate the automated image caption generator with ResNet-101 encoder, and the LSTM/gated recurrent units decoder outperforms the popular neural image caption NIC generator in the presence of par-inject concatenate conditioning and beam search. For quantitative assessment, we used ROUGEL\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$ROUGE_L$$\end{document}, CIDErD\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$CIDEr_D$$\end{document}, and BLEUn\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$BLEU_n$$\end{document} scores to compare the different models.
引用
收藏
页码:5719 / 5742
页数:23
相关论文
共 37 条
[11]  
Schmidhuber J(2018)Where to put the image in an image caption generator Nat. Lang. Eng. 24 467-489
[12]  
Hodosh M(2017)Image captioning and visual question answering based on attributes and external knowledge IEEE Trans. Pattern Anal. Mach. Intell. 40 1367-1381
[13]  
Young P(undefined)undefined undefined undefined undefined-undefined
[14]  
Hockenmaier J(undefined)undefined undefined undefined undefined-undefined
[15]  
Khamparia A(undefined)undefined undefined undefined undefined-undefined
[16]  
Pandey B(undefined)undefined undefined undefined undefined-undefined
[17]  
Tiwari S(undefined)undefined undefined undefined undefined-undefined
[18]  
Gupta D(undefined)undefined undefined undefined undefined-undefined
[19]  
Khanna A(undefined)undefined undefined undefined undefined-undefined
[20]  
Rodrigues JJ(undefined)undefined undefined undefined undefined-undefined