An Efficient Deep Learning based Hybrid Model Image Caption Generation for

被引：0

作者：

Kaur, Mehzabeen ^{[1
]}

Kaur, Harpreet ^{[1
]}

机构：

[1] Punjabi Univ, Dept Comp Sci & Engn, Patiala, India

来源：

INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS | 2023年 / 14卷 / 03期

关键词：

CNN; RNN; LSTM; YOLO;

D O I：

10.14569/IJACSA.2023.0140326

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

In the recent yeas, with the increase in the use of different social media platforms, image captioning approach play a major role in automatically describe the whole image into natural language sentence. Image captioning plays a significant role in computer-based society. Image captioning is the process of automatically generating the natural language textual description of the image using artificial intelligence techniques. Computer vision and natural language processing are the key aspect of the image processing system. Convolutional Neural Network (CNN) is a part of computer vision and used object detection and feature extraction and on the other side Natural Language Processing (NLP) techniques help in generating the textual caption of the image. Generating suitable image description by machine is challenging task as it is based upon object detection, location and their semantic relationships in a human understandable language such as English. In this paper our aim to develop an encoder-decoder based hybrid image captioning approach using VGG16, ResNet50 and YOLO. VGG16 and ResNet50 are the pre-trained feature extraction model which are trained on millions of images. YOLO is used for real time object detection. It first extracts the image features using VGG16, ResNet50 and YOLO and concatenate the result in to single file. At last LSTM and BiGRU are used for textual description of the image. Proposed model is evaluated by using BLEU, METEOR and RUGE score.

引用

页码：231 / 237

页数：7

共 34 条

[1] Amritkar C, 2018, 2018 FOURTH INTERNATIONAL CONFERENCE ON COMPUTING COMMUNICATION CONTROL AND AUTOMATION (ICCUBEA)
[2] Convolutional Image Captioning
Aneja, Jyoti
Deshpande, Aditya
Schwing, Alexander G.
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5561 - 5570
[3] What we see in a photograph: content selection for image captioning
Barlas, Georgios
Veinidis, Christos
Arampatzis, Avi
[J]. VISUAL COMPUTER, 2021, 37 (06) : 1309 - 1326
[4] Landslide Image Captioning Method Based on Semantic Gate and Bi-Temporal LSTM
Cui, Wenqi
He, Xin
Yao, Meng
Wang, Ziwei
Li, Jie
Hao, Yuanjie
Wu, Weijie
Zhao, Huiling
Chen, Xianfeng
Cui, Wei
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2020, 9 (04)
[5] Long-Term Recurrent Convolutional Networks for Visual Recognition and Description
Donahue, Jeff
Hendricks, Lisa Anne
Rohrbach, Marcus
Venugopalan, Subhashini
Guadarrama, Sergio
Saenko, Kate
Darrell, Trevor
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (04) : 677 - 691
[6] Dong H., I2T2I LEARNING TEXT
[7] Dong XZ, 2021, Arxiv, DOI arXiv:2108.02366
[8] Every Picture Tells a Story: Generating Sentences from Images
Farhadi, Ali
Hejrati, Mohsen
Sadeghi, Mohammad Amin
Young, Peter
Rashtchian, Cyrus
Hockenmaier, Julia
Forsyth, David
[J]. COMPUTER VISION-ECCV 2010, PT IV, 2010, 6314 : 15 - +
[9] Ghosh A, 2020, NEURAL NETWORK FRAME, P171
[10] An Empirical Study of Language CNN for Image Captioning
Gu, Jiuxiang
Wang, Gang
Cai, Jianfei
Chen, Tsuhan
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 1231 - 1240

← 1 2 3 4 →