Deep Neural Networks for Efficient Image Caption Generation

被引：0

作者：

Rai, Riddhi ^{[1
]}

Guruprasad, Navya Shimoga ^{[1
]}

Tumuluru, Shreya Sindhu ^{[1
]}

机构：

[1] Ramaiah Inst Technol, Bangalore, Karnataka, India

来源：

ADVANCED NETWORK TECHNOLOGIES AND INTELLIGENT COMPUTING, ANTIC 2023, PT II | 2024年 / 2091卷

关键词：

Image Captioning; Deep Learning; CNN; LSTM;

D O I：

10.1007/978-3-031-64064-3_18

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the era of rapidly advancing technology, the integration of computer vision and natural language processing has emerged as a pivotal area of research, with deep learning playing a central role. The task of generating descriptive textual captions for images is known as image captioning. It is necessary for enhancing accessibility, aiding visually impaired individuals, and improving human-computer interaction by providing meaningful context to visual content. Generating relevant descriptions for high-level image semantics involves not just recognizing objects and scenes but also analyzing the state, attributes, and relationships among them. This research paper investigates the synergy of Convolutional Neural Networks (CNNs) for effective image feature extraction and Long Short-Term Memory (LSTM) networks for capturing sequential dependencies in generating descriptive and coherent textual captions. It has been demonstrated that it can produce precise and contextually relevant descriptions for a variety of images.

引用

页码：247 / 260

页数：14

共 8 条

[1] Deep image captioning using an ensemble of CNN and LSTM based deep neural networks [J].

Alzubi, Jafar A. ;

Jain, Rachna ;

Nagrath, Preeti ;

Satapathy, Suresh ;

Taneja, Soham ;

Gupta, Paras .

JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (04) :5761-5769

[2]

[Anonymous], 2012, 25 INT C NEURAL INFO

[3] Automatic Image Captioning Based on ResNet50 and LSTM with Soft Attention [J].

Chu, Yan ;

Yue, Xiao ;

Yu, Lei ;

Sergei, Mikhailov ;

Wang, Zhengkui .

WIRELESS COMMUNICATIONS & MOBILE COMPUTING, 2020, 2020

[4] Entangled Transformer for Image Captioning [J].

Li, Guang ;

Zhu, Linchao ;

Liu, Ping ;

Yang, Yi .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8927-8936

[5]

Poddar Ayush Kumar, 2023, Procedia Computer Science, P686, DOI 10.1016/j.procs.2023.01.049

[6]

Shuang Liu, 2018, MATEC Web of Conferences, V232, DOI 10.1051/matecconf/201823201052

[7]

Vinyals O, 2015, PROC CVPR IEEE, P3156, DOI 10.1109/CVPR.2015.7298935

[8]

Wang C., 2016, arXiv

← 1 →