Image captioning improved visual question answering

被引：0

作者：

Himanshu Sharma

Anand Singh Jalal

机构：

[1] GLA University Mathura,Department of Computer Engineering and Applications

来源：

Multimedia Tools and Applications | 2022年 / 81卷

关键词：

Visual question answering (VQA); Image captioning; Computer vision (CV); Natural language processing (NLP);

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.

引用

页码：34775 / 34796

页数：21

共 50 条

[1] Image captioning improved visual question answering
Sharma, Himanshu
Jalal, Anand Singh
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
[2] Improving Visual Question Answering by Image Captioning
Shao, Xiangjun
Dong, Hongsong
Wu, Guangsheng
IEEE ACCESS, 2025, 13 : 46299 - 46311
[3] A visual question answering model based on image captioning
Zhou, Kun
Liu, Qiongjie
Zhao, Dexin
MULTIMEDIA SYSTEMS, 2024, 30 (06)
[4] Auto-Parsing Network for Image Captioning and Visual Question Answering
Yang, Xu
Gao, Chongyang
Zhang, Hanwang
Cai, Jianfei
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2177 - 2187
[5] Relation-Aware Image Captioning for Explainable Visual Question Answering
Tseng, Ching-Shan
Lin, Ying-Jia
Kao, Hung-Yu
2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
[6] Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
Wu, Qi
Shen, Chunhua
Wang, Peng
Dick, Anthony
van den Hengel, Anton
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1367 - 1381
[7] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
Dong, Xuanyi
Zhu, Linchao
Zhang, De
Yang, Yi
Wu, Fei
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 54 - 62
[8] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Anderson, Peter
He, Xiaodong
Buehler, Chris
Teney, Damien
Johnson, Mark
Gould, Stephen
Zhang, Lei
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
[9] Relation-Aware Image Captioning with Hybrid-Attention for Explainable Visual Question Answering
Lin, Ying-Jia
Tseng, Ching-Shan
Kao, Hung-Yu
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (03) : 649 - 659
[10] Image captioning for effective use of language models in knowledge-based visual question answering
Salaberria, Ander
Azkune, Gorka
Lacalle, Oier Lopez de
Soroa, Aitor
Agirre, Eneko
EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212

← 1 2 3 4 5 →