A Comparative Analysis on Image Caption Generator Using Deep Learning Architecture-ResNet and VGG16

被引：3

作者：

Neha, V. Sri ^{[1
]}

Nikhila, B. ^{[1
]}

Deepika, K. ^{[1
]}

Subetha, T. ^{[1
]}

机构：

[1] BVRIT HYDERABAD Coll Engn Women, Hyderabad, India

来源：

COMPUTATIONAL VISION AND BIO-INSPIRED COMPUTING ( ICCVBIC 2021) | 2022年 / 1420卷

关键词：

Image caption generator; VGG16; ResNet50; Flickr8k dataset; Deep learning;

D O I：

10.1007/978-981-16-9573-5_15

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Image caption generator generates the caption for a given image by understanding the image. The functionality is that it involves numerous concepts of computer vision to identify the image and to reciprocate the same in English. The challenging part of the caption generation is to understand the image and understand the image context and produce English description for the image. In our work, we compared the abilities of two deep learning architectures named VGG16 and ResNet50 for understanding the image and LSTM for generating the relevant caption for the image. The paper discusses about the usage of two deep learning architectures on generating the captions from the photograph. With the advancements in the deep learning techniques, the Flickr8k datasets are taken that have high dimensionality to compare the performance of the caption generated. The Flickr8k dataset has 8000 images where every image is grouped with five varied captions that determine the appropriate content of the image. The high computational power of the deep learning techniques is helpful to build models that can generate captions for picture. The two deep learning architectures performance is compared using BLEU score. The widely used applications of image caption generator are to describe caption for photograph so that blind can understand the image.

引用

页码：209 / 218

页数：10

共 14 条

[1] Convolutional Image Captioning [J].

Aneja, Jyoti ;

Deshpande, Aditya ;

Schwing, Alexander G. .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5561-5570

[2]

Anu M., 2021, 2021 5 INT C INT COM

[3] Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention [J].

Chu, Yan ;

Yue, Xiao ;

Yu, Lei ;

Sergei, Mikhailov ;

Wang, Zhengkui .

WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020

[4]

Han S.-H., 2020, 2020 IEEE INT C BIG

[5] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[6]

Katiyar S., 2021, ARXIV PREPRINT ARXIV

[7]

Mathur P, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS)

[8]

Ranganathan G., 2021, J. Innov. Image Process, V3, P66, DOI DOI 10.36548/JIIP.2021.1.006

[9]

Seo P.H., 2020, P AAAI C ART INT, V34

[10] Phrase-based image caption generator with hierarchical LSTM network [J].

Tan, Ying Hua ;

Chan, Chee Seng .

NEUROCOMPUTING, 2019, 333 :86-100

← 1 2 →