ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor

被引:3
|
作者
Hossen, Md. Bipul [1 ]
Ye, Zhongfu [1 ]
Abdussalam, Amr [1 ]
Hossain, Mohammad Alamgir [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China
关键词
Fine-grained image caption; Attention mechanism; Encoder-decoder; Independent attribute predictor; Enhanced attribute predictor;
D O I
10.1016/j.displa.2024.102798
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [1] Fine-Grained Features for Image Captioning
    Shao, Mengyue
    Feng, Jie
    Wu, Jie
    Zhang, Haixiang
    Zheng, Yayu
    CMC-COMPUTERS MATERIALS & CONTINUA, 2023, 75 (03): : 4697 - 4712
  • [2] Context-Aware Visual Policy Network for Fine-Grained Image Captioning
    Zha, Zheng-Jun
    Liu, Daqing
    Zhang, Hanwang
    Zhang, Yongdong
    Wu, Feng
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (02) : 710 - 722
  • [3] Attribute-Driven Filtering: A new attributes predicting approach for fine-grained image captioning
    Hossen, Md. Bipul
    Ye, Zhongfu
    Abdussalam, Amr
    Ul Hassan, Shabih
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 137
  • [4] Fine-grained person-based image captioning via advanced spectrum parsing
    Wu, Jianhui
    Ni, Fan
    Wang, Zijie
    Ju, Haoyu
    Zhang, Yue
    Hu, Fangqiang
    Li, Yifeng
    MULTIMEDIA TOOLS AND APPLICATIONS, 2024, 83 (11) : 34015 - 34030
  • [5] Fine-grained person-based image captioning via advanced spectrum parsing
    Jianhui Wu
    Fan Ni
    Zijie Wang
    Haoyu Ju
    Yue Zhang
    Fangqiang Hu
    Yifeng Li
    Multimedia Tools and Applications, 2024, 83 : 34015 - 34030
  • [6] FineFormer: Fine-Grained Adaptive Object Transformer for Image Captioning
    Wang, Bo
    Zhang, Zhao
    Fan, Jicong
    Zhao, Mingbo
    Zhan, Choujun
    Xu, Mingliang
    2022 IEEE INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 2022, : 508 - 517
  • [7] c-RNN: A Fine-Grained Language Model for Image Captioning
    Huang, Gengshi
    Hu, Haifeng
    NEURAL PROCESSING LETTERS, 2019, 49 (02) : 683 - 691
  • [8] c-RNN: A Fine-Grained Language Model for Image Captioning
    Gengshi Huang
    Haifeng Hu
    Neural Processing Letters, 2019, 49 : 683 - 691
  • [9] Fine-grained and Semantic-guided Visual Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2018 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2018), 2018, : 1709 - 1717
  • [10] Fine-Grained Image Captioning With Global-Local Discriminative Objective
    Wu, Jie
    Chen, Tianshui
    Wu, Hefeng
    Yang, Zhi
    Luo, Guangchun
    Lin, Liang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2413 - 2427