RVAIC: Refined visual attention for improved image captioning

被引：0

作者：

Al-Qatf, Majjed ^{[1
,3
]}

Hawbani, Ammar ^{[1
,6
]}

Wang, XingFu ^{[1
]}

Abdusallam, Amr ^{[2
]}

Alsamhi, Saeed ^{[3
,4
]}

Alhabib, Mohammed ^{[5
]}

Curry, Edward ^{[3
]}

机构：

[1] School of Computer Science and Technology, University of Science and Technology of China, Anhui, Hefei, China

[2] School of Electronic Engineering and Information Science, University of Science and Technology of China, Anhui, Hefei, China

[3] Insight Centre for Data Analytics, University of Galway, Galway, Ireland

[4] Faculty of Engineering, IBB University, IBB, Yemen

[5] School of Computer Science and Engineering, Centeral South University, Changsha, China

[6] School of Computer Science, Shenyang Aerospace University, Shenyang, China

来源：

Journal of Intelligent and Fuzzy Systems | 2024年 / 46卷 / 02期

基金：

爱尔兰科学基金会;

关键词：

Behavioral research - Image enhancement - Visual languages;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Visual attention has emerged as a prominent approach for improving the effectiveness of image captioning, as it enables the decoder network to focus selectively on the most salient regions in the image content, thereby facilitating the generation of precise and informative captions. Although visual attention achieves the improvement, the small numerical values of its input have a negative impact on its softmax, decreasing its effectiveness. To address this limitation, we propose a refined visual attention (RVA) framework that internally reweights visual attention by leveraging the language context of previously generated words. We first feed the language context into a fully connected layer to obtain appropriate dimensions for the visual features. Then, we use a sigmoid function to obtain a probability distribution to reweight the softmax’s input by applying the multiplication process. Experiments conducted on the MS COCO dataset demonstrate that RVA outperforms traditional visual attention and other existing image captioning methods, highlighting its effectiveness in enhancing the accuracy and informativeness of image captions. © 2024 – IOS Press. All rights reserved.

引用

页码：3447 / 3459

共 50 条

[1] RVAIC: Refined visual attention for improved image captioning
Al-Qatf, Majjed
Hawbani, Ammar
Wang, XingFu
Abdusallam, Amr
Alsamhi, Saeed
Alhabib, Mohammed
Curry, Edward
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2024, 46 (02) : 3447 - 3459
[2] A Novelty Framework in Image-Captioning with Visual Attention-Based Refined Visual Features
Thobhani, Alaa
Zou, Beiji
Kui, Xiaoyan
Abdussalam, Amr
Asim, Muhammad
Elaffendi, Mohammed
Shah, Sajid
CMC-COMPUTERS MATERIALS & CONTINUA, 2025, 82 (03): : 3943 - 3964
[3] Visual Relationship Attention for Image Captioning
Zhang, Zongjian
Wu, Qiang
Wang, Yang
Chen, Fang
2019 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2019,
[4] Bengali Image Captioning with Visual Attention
Ami, Amit Saha
Humaira, Mayeesha
Jim, Md Abidur Rahman Khan
Paul, Shimul
Shah, Faisal Muhammad
2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
[5] Image Captioning Based on Visual and Semantic Attention
Wei, Haiyang
Li, Zhixin
Zhang, Canlong
MULTIMEDIA MODELING (MMM 2020), PT I, 2020, 11961 : 151 - 162
[6] Social Image Captioning: Exploring Visual Attention and User Attention
Wang, Leiquan
Chu, Xiaoliang
Zhang, Weishan
Wei, Yiwei
Sun, Weichen
Wu, Chunlei
SENSORS, 2018, 18 (02)
[7] Image captioning improved visual question answering
Himanshu Sharma
Anand Singh Jalal
Multimedia Tools and Applications, 2022, 81 : 34775 - 34796
[8] Image captioning improved visual question answering
Sharma, Himanshu
Jalal, Anand Singh
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
[9] Image captioning in Bengali language using visual attention
Masud, Adiba
Hosen, Md. Biplob
Habibullah, Md.
Anannya, Mehrin
Kaiser, M. Shamim
PLOS ONE, 2025, 20 (02):
[10] Image Captioning with Text-Based Visual Attention
Chen He
Haifeng Hu
Neural Processing Letters, 2019, 49 : 177 - 185

← 1 2 3 4 5 →