Automatic image captioning system based on augmentation and ranking mechanism

被引:0
|
作者
B. S. Revathi
A. Meena Kowshalya
机构
[1] Government College of Technology,
来源
关键词
Image captioning; Deep neural network; Encoder-decoder architecture; Ranking LSTM;
D O I
暂无
中图分类号
学科分类号
摘要
Research on automatically producing syntactically and semantically accurate captions is still an open challenge. This paper proposes an effective pretrained Augmentation–Ranking (A–R) Image Captioning model. The proposed model improves the properties of the images and produces appropriate captions. The employed novel augmentation strategy improves convolution neural network (CNN) operation, while Ranking and Feedback Propagation improve Long Short-Term Memory (LSTM). Our proposed model seeks to address the issues of complexity, vanishing gradients and context during training. The proposed A–R model improves the performance of LSTM and CNN. The image dataset for training is expanded using the augmented CNN. Through ranks, the Ranking LSTM aids in the identification of the semantic captions. This blending method enhances the working of image captioning system. Utilizing greedy and beam search, the proposed A–R model is examined under maximum and average pooling. The outcomes are compared with cutting-edge models such as the bidirectional recurrent neural network, Google NIC and Bi-LSTM combined with semantic attention mechanism. The proposed model is assessed using the Flickr 8 k and Flickr 30 k dataset and assessed using measures including BLEU, METEOR and CIDER. The proposed model with reduced complexity generated captions deemed accurate, syntactically correct and semantically correct by achieving an accuracy of 74.87% above all baseline models, according to experimental results.
引用
收藏
页码:265 / 274
页数:9
相关论文
共 50 条
  • [41] Show, Recall, and Tell: Image Captioning with Recall Mechanism
    Wang, Li
    Bai, Zechen
    Zhang, Yonghua
    Lu, Hongtao
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 12176 - 12183
  • [42] Image captioning based on dependency syntax
    Bi J.
    Liu M.
    Hu H.
    Dai J.
    Beijing Hangkong Hangtian Daxue Xuebao/Journal of Beijing University of Aeronautics and Astronautics, 2021, 47 (03): : 431 - 440
  • [43] Reference Based LSTM for Image Captioning
    Chen, Minghai
    Ding, Guiguang
    Zhao, Sicheng
    Chen, Hui
    Han, Jungong
    Liu, Qiang
    THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 3981 - 3987
  • [44] Image Captioning Based on Semantic Scenes
    Zhao, Fengzhi
    Yu, Zhezhou
    Wang, Tao
    Lv, Yi
    ENTROPY, 2024, 26 (10)
  • [45] Phrase-based Image Captioning
    Lebret, Remi
    Pinheiro, Pedro O.
    Collobert, Ronan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 2085 - 2094
  • [46] Automatic Captioning based on Visible and Infrared Images
    Wang, Yan
    Lou, Shuli
    Wang, Kai
    Wang, Yunzhe
    Yuan, Xiaohu
    Liu, Huaping
    2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024), 2024, : 11312 - 11318
  • [47] Automatic image captioning in Thai for house defect using a deep learning-based approach
    Manadda Jaruschaimongkol
    Krittin Satirapiwong
    Kittipan Pipatsattayanuwong
    Suwant Temviriyakul
    Ratchanat Sangprasert
    Thitirat Siriborvornratanakul
    Advances in Computational Intelligence, 2024, 4 (1):
  • [48] Automatic Arabic Image Captioning using RNN-LSTM-Based Language Model and CNN
    Al-Muzaini, Huda A.
    Al-Yahya, Tasniem N.
    Benhidour, Hafida
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2018, 9 (06) : 67 - 73
  • [49] A HTTP Botnet Detection System Based on Ranking Mechanism
    Lee, Yuan-Chin
    Tseng, Chuan-Mu
    Liu, Tzong-Jye
    2017 TWELFTH INTERNATIONAL CONFERENCE ON DIGITAL INFORMATION MANAGEMENT (ICDIM), 2017, : 115 - 120
  • [50] CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation
    Basioti, Kalliopi
    Abdelsalam, Mohamed A.
    Fancellu, Federico
    Pavlovic, Vladimir
    Fazly, Afsaneh
    COMPUTER VISION - ECCV 2024, PT LXVI, 2025, 15124 : 444 - 461