RVAIC: Refined visual attention for improved image captioning

被引:0
|
作者
Al-Qatf, Majjed [1 ,3 ]
Hawbani, Ammar [1 ,6 ]
Wang, XingFu [1 ]
Abdusallam, Amr [2 ]
Alsamhi, Saeed [3 ,4 ]
Alhabib, Mohammed [5 ]
Curry, Edward [3 ]
机构
[1] School of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China
[2] School of Electronic Engineering and Information Science, University of Science and Technology of China, Anhui, Hefei, China
[3] Insight Centre for Data Analytics, University of Galway, Galway, Ireland
[4] Faculty of Engineering, IBB University, IBB, Yemen
[5] School of Computer Science and Engineering, Centeral South University, Changsha, China
[6] School of Computer Science, Shenyang Aerospace University, Shenyang, China
来源
Journal of Intelligent and Fuzzy Systems | 2024年 / 46卷 / 02期
基金
爱尔兰科学基金会;
关键词
Behavioral research - Image enhancement - Visual languages;
D O I
暂无
中图分类号
学科分类号
摘要
Visual attention has emerged as a prominent approach for improving the effectiveness of image captioning, as it enables the decoder network to focus selectively on the most salient regions in the image content, thereby facilitating the generation of precise and informative captions. Although visual attention achieves the improvement, the small numerical values of its input have a negative impact on its softmax, decreasing its effectiveness. To address this limitation, we propose a refined visual attention (RVA) framework that internally reweights visual attention by leveraging the language context of previously generated words. We first feed the language context into a fully connected layer to obtain appropriate dimensions for the visual features. Then, we use a sigmoid function to obtain a probability distribution to reweight the softmax’s input by applying the multiplication process. Experiments conducted on the MS COCO dataset demonstrate that RVA outperforms traditional visual attention and other existing image captioning methods, highlighting its effectiveness in enhancing the accuracy and informativeness of image captions. © 2024 – IOS Press. All rights reserved.
引用
收藏
页码:3447 / 3459
相关论文
共 50 条
  • [1] RVAIC: Refined visual attention for improved image captioning
    Al-Qatf, Majjed
    Hawbani, Ammar
    Wang, XingFu
    Abdusallam, Amr
    Alsamhi, Saeed
    Alhabib, Mohammed
    Curry, Edward
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3447 - 3459
  • [2] A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features
    Thobhani, Alaa
    Zou, Beiji
    Kui, Xiaoyan
    Abdussalam, Amr
    Asim, Muhammad
    Elaffendi, Mohammed
    Shah, Sajid
    CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (03): : 3943 - 3964
  • [3] Visual Relationship Attention for Image Captioning
    Zhang, Zongjian
    Wu, Qiang
    Wang, Yang
    Chen, Fang
    2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
  • [4] Bengali Image Captioning with Visual Attention
    Ami, Amit Saha
    Humaira, Mayeesha
    Jim, Md Abidur Rahman Khan
    Paul, Shimul
    Shah, Faisal Muhammad
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [5] Image Captioning Based on Visual and Semantic Attention
    Wei, Haiyang
    Li, Zhixin
    Zhang, Canlong
    MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 151 - 162
  • [6] Social Image Captioning: Exploring Visual Attention and User Attention
    Wang, Leiquan
    Chu, Xiaoliang
    Zhang, Weishan
    Wei, Yiwei
    Sun, Weichen
    Wu, Chunlei
    SENSORS, 2018, 18 (02)
  • [7] Image captioning improved visual question answering
    Himanshu Sharma
    Anand Singh Jalal
    Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
  • [8] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [9] Image captioning in Bengali language using visual attention
    Masud, Adiba
    Hosen, Md. Biplob
    Habibullah, Md.
    Anannya, Mehrin
    Kaiser, M. Shamim
    PLOS ONE, 2025, 20 (02):
  • [10] Image Captioning with Text-Based Visual Attention
    Chen He
    Haifeng Hu
    Neural Processing Letters, 2019, 49 : 177 - 185