ICEAP: An advanced fine-grained image captioning network with enhanced attribute predictor

被引:3
|
作者
Hossen, Md. Bipul [1 ]
Ye, Zhongfu [1 ]
Abdussalam, Amr [1 ]
Hossain, Mohammad Alamgir [1 ]
机构
[1] Univ Sci & Technol China, Sch Informat Sci & Technol, Hefei 230027, Anhui, Peoples R China
关键词
Fine-grained image caption; Attention mechanism; Encoder-decoder; Independent attribute predictor; Enhanced attribute predictor;
D O I
10.1016/j.displa.2024.102798
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Fine-grained image captioning is a focal point in the vision-to-language task and has attracted considerable attention for generating accurate and contextually relevant image captions. Effective attribute prediction and their utilization play a crucial role in enhancing image captioning performance. Despite progress in prior attribute-related methods, they either focus on predicting attributes related to the input image or concentrate on predicting linguistic context-related attributes at each time step in the language model. However, these approaches often overlook the importance of balancing visual and linguistic contexts, leading to ineffective exploitation of semantic information and a subsequent decline in performance. To address these issues, an Independent Attribute Predictor (IAP) is introduced to precisely predict attributes related to the input image by leveraging relationships between visual objects and attribute embeddings. Following this, an Enhanced Attribute Predictor (EAP) is proposed, initially predicting linguistic context-related attributes and then using prior probabilities from the IAP module to rebalance image and linguistic context-related attributes, thereby generating more robust and enhanced attribute probabilities. These refined attributes are then integrated into the language LSTM layer to ensure accurate word prediction at each time step. The integration of the IAP and EAP modules in our proposed image captioning with the enhanced attribute predictor (ICEAP) model effectively incorporates high-level semantic details, enhancing overall model performance. The ICEAP outperforms contemporary models, yielding significant average improvements of 10.62% in CIDEr-D scores for MS-COCO, 9.63% for Flickr30K and 7.74% for Flickr8K datasets using cross-entropy optimization, with qualitative analysis confirming its ability to generate fine-grained captions.
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Gupta, Neeraj
    Jalal, Anand Singh
    NEURAL COMPUTING & APPLICATIONS, 2020, 32 (24): : 17899 - 17908
  • [22] Integration of textual cues for fine-grained image captioning using deep CNN and LSTM
    Neeraj Gupta
    Anand Singh Jalal
    Neural Computing and Applications, 2020, 32 : 17899 - 17908
  • [23] A coarse-to-fine capsule network for fine-grained image categorization
    Lin, Zhongqi
    Jia, Jingdun
    Huang, Feng
    Gao, Wanlin
    NEUROCOMPUTING, 2021, 456 : 200 - 219
  • [24] A coarse-to-fine capsule network for fine-grained image categorization
    College of Information and Electrical Engineering, China Agricultural University, Beijing
    100083, China
    不详
    100083, China
    不详
    100083, China
    Neurocomputing, 1600, (200-219):
  • [25] An Attribute Based Encryption Scheme with Fine-Grained Attribute Revocation
    Li, Qiang
    Feng, Dengguo
    Zhang, Liwu
    2012 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2012, : 885 - 890
  • [26] Fine-Grained Image Search
    Xie, Lingxi
    Wang, Jingdong
    Zhang, Bo
    Tian, Qi
    IEEE TRANSACTIONS ON MULTIMEDIA, 2015, 17 (05) : 636 - 647
  • [27] Attribute-specific Control Units in StyleGAN for Fine-grained Image Manipulation
    Wang, Rui
    Chen, Jian
    Yu, Gang
    Sun, Li
    Yu, Changqian
    Gao, Changxin
    Sang, Nong
    PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021, 2021, : 926 - 934
  • [28] Learning Semantically Enhanced Feature for Fine-Grained Image Classification
    Luo, Wei
    Zhang, Hengmin
    Li, Jun
    Wei, Xiu-Shen
    IEEE SIGNAL PROCESSING LETTERS, 2020, 27 (27) : 1545 - 1549
  • [29] Subtler mixed attention network on fine-grained image classification
    Liu, Chao
    Huang, Lei
    Wei, Zhiqiang
    Zhang, Wenfeng
    APPLIED INTELLIGENCE, 2021, 51 (11) : 7903 - 7916
  • [30] Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network
    Ma, Zhe
    Dong, Jianfeng
    Long, Zhongzi
    Zhang, Yao
    He, Yuan
    Xue, Hui
    Ji, Shouling
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 11741 - 11748