Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation

被引：1

作者：

Chaudhari, Chaitrali Prasanna ^{[1
]}

Devane, Satish ^{[2
]}

机构：

[1] Lokmanya Tilak Coll Engn, Sect 4, Navi Mumbai 400709, Maharashtra, India

[2] Karmaveer Adv Baburao Ganpatrao Thakare Coll Engn, Gangapur Rd, Nasik 422013, Maharashtra, India

来源：

INTERNATIONAL JOURNAL OF IMAGE AND GRAPHICS | 2022年 / 22卷 / 02期

关键词：

Image captioning; CNN; LSTM model; rider optimization; Jaccard similarity; ATTENTION; MODELS; PSO;

D O I：

10.1142/S0219467822500218

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

"Image Captioning is the process of generating a textual description of an image". It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like "man", "woman", "group of people", "building", etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.

引用

页数：23

共 50 条

[1] A Hindi Image Caption Generation Framework Using Deep Learning
Mishra, Santosh Kumar
Dhir, Rijul
Saha, Sriparna
Bhattacharyya, Pushpak
ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
[2] Image caption generation using a dual attention mechanism
Padate, Roshni
Jain, Amit
Kalla, Mukesh
Sharma, Arvind
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
[3] Attention based sequence-to-sequence framework for auto image caption generation
Khan, Rashid
Islam, M. Shujah
Kanwal, Khadija
Iqbal, Mansoor
Hossain, Md Imran
Ye, Zhongfu
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 159 - 170
[4] Image Caption Generation Using A Deep Architecture
Hani, Ansar
Tagougui, Najiba
Kherallah, Monji
2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
[5] Automatic Image Caption Generation Using ResNet & Torch Vision
Verma, Vijeta
Saritha, Sri Khetwat
Jain, Sweta
MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
[6] Automatic image caption generation using deep learning
Verma, Akash
Yadav, Arun Kumar
Kumar, Mohit
Yadav, Divakar
MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
[7] A Deep Attention based Framework for Image Caption Generation in Hindi Language
Dhir, Rijul
Mishra, Santosh Kumar
Saha, Sriparna
Bhattacharyya, Pushpak
COMPUTACION Y SISTEMAS, 2019, 23 (03): : 693 - 701
[8] An encoder-decoder based framework for hindi image caption generation
Alok Singh
Thoudam Doren Singh
Sivaji Bandyopadhyay
Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
[9] An encoder-decoder based framework for hindi image caption generation
Singh, Alok
Singh, Thoudam Doren
Bandyopadhyay, Sivaji
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
[10] Precise and Faster Image Description Generation with Limited Resources Using an Improved Hybrid Deep Model
Patra, Biswajit
Kisku, Dakshina Ranjan
PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 166 - 175

← 1 2 3 4 5 →