Improved Framework using Rider Optimization Algorithm for Precise Image Caption Generation

被引:1
|
作者
Chaudhari, Chaitrali Prasanna [1 ]
Devane, Satish [2 ]
机构
[1] Lokmanya Tilak Coll Engn, Sect 4, Navi Mumbai 400709, Maharashtra, India
[2] Karmaveer Adv Baburao Ganpatrao Thakare Coll Engn, Gangapur Rd, Nasik 422013, Maharashtra, India
关键词
Image captioning; CNN; LSTM model; rider optimization; Jaccard similarity; ATTENTION; MODELS; PSO;
D O I
10.1142/S0219467822500218
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
"Image Captioning is the process of generating a textual description of an image". It deploys both computer vision and natural language processing for caption generation. However, the majority of the image captioning systems offer unclear depictions regarding the objects like "man", "woman", "group of people", "building", etc. Hence, this paper intends to develop an intelligent-based image captioning model. The adopted model comprises of few steps like word generation, sentence formation, and caption generation. Initially, the input image is subjected to the Deep learning classifier called Convolutional Neural Network (CNN). Since the classifier is already trained in the relevant words that are related to all images, it can easily classify the associated words of the given image. Further, a set of sentences is formed with the generated words using Long-Short Term Memory (LSTM) model. The likelihood of the formed sentences is computed using the Maximum Likelihood (ML) function, and the sentences with higher probability are taken, which is further used for generating the visual representation of the scene in terms of image caption. As a major novelty, this paper aims to enhance the performance of CNN by optimally tuning its weight and activation function. This paper introduces a new enhanced optimization algorithm Rider with Randomized Bypass and Over-taker update (RR-BOU) for this optimal selection. In the proposed RR-BOU is the enhanced version of the Rider Optimization Algorithm (ROA). Finally, the performance of the proposed captioning model is compared over other conventional models with respect to statistical analysis.
引用
收藏
页数:23
相关论文
共 50 条
  • [1] A Hindi Image Caption Generation Framework Using Deep Learning
    Mishra, Santosh Kumar
    Dhir, Rijul
    Saha, Sriparna
    Bhattacharyya, Pushpak
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2021, 20 (02)
  • [2] Image caption generation using a dual attention mechanism
    Padate, Roshni
    Jain, Amit
    Kalla, Mukesh
    Sharma, Arvind
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 123
  • [3] Attention based sequence-to-sequence framework for auto image caption generation
    Khan, Rashid
    Islam, M. Shujah
    Kanwal, Khadija
    Iqbal, Mansoor
    Hossain, Md Imran
    Ye, Zhongfu
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (01) : 159 - 170
  • [4] Image Caption Generation Using A Deep Architecture
    Hani, Ansar
    Tagougui, Najiba
    Kherallah, Monji
    2019 INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2019, : 246 - 251
  • [5] Automatic Image Caption Generation Using ResNet & Torch Vision
    Verma, Vijeta
    Saritha, Sri Khetwat
    Jain, Sweta
    MACHINE LEARNING, IMAGE PROCESSING, NETWORK SECURITY AND DATA SCIENCES, MIND 2022, PT II, 2022, 1763 : 82 - 101
  • [6] Automatic image caption generation using deep learning
    Verma, Akash
    Yadav, Arun Kumar
    Kumar, Mohit
    Yadav, Divakar
    MULTIMEDIA TOOLS AND APPLICATIONS, 2023, 83 (2) : 5309 - 5325
  • [7] A Deep Attention based Framework for Image Caption Generation in Hindi Language
    Dhir, Rijul
    Mishra, Santosh Kumar
    Saha, Sriparna
    Bhattacharyya, Pushpak
    COMPUTACION Y SISTEMAS, 2019, 23 (03): : 693 - 701
  • [8] An encoder-decoder based framework for hindi image caption generation
    Alok Singh
    Thoudam Doren Singh
    Sivaji Bandyopadhyay
    Multimedia Tools and Applications, 2021, 80 : 35721 - 35740
  • [9] An encoder-decoder based framework for hindi image caption generation
    Singh, Alok
    Singh, Thoudam Doren
    Bandyopadhyay, Sivaji
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (28-29) : 35721 - 35740
  • [10] Precise and Faster Image Description Generation with Limited Resources Using an Improved Hybrid Deep Model
    Patra, Biswajit
    Kisku, Dakshina Ranjan
    PATTERN RECOGNITION AND MACHINE INTELLIGENCE, PREMI 2023, 2023, 14301 : 166 - 175