Image captioning improved visual question answering

被引:0
|
作者
Himanshu Sharma
Anand Singh Jalal
机构
[1] GLA University Mathura,Department of Computer Engineering and Applications
来源
关键词
Visual question answering (VQA); Image captioning; Computer vision (CV); Natural language processing (NLP);
D O I
暂无
中图分类号
学科分类号
摘要
Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.
引用
收藏
页码:34775 / 34796
页数:21
相关论文
共 50 条
  • [1] Image captioning improved visual question answering
    Sharma, Himanshu
    Jalal, Anand Singh
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (24) : 34775 - 34796
  • [2] Improving Visual Question Answering by Image Captioning
    Shao, Xiangjun
    Dong, Hongsong
    Wu, Guangsheng
    IEEE ACCESS, 2025, 13 : 46299 - 46311
  • [3] A visual question answering model based on image captioning
    Zhou, Kun
    Liu, Qiongjie
    Zhao, Dexin
    MULTIMEDIA SYSTEMS, 2024, 30 (06)
  • [4] Auto-Parsing Network for Image Captioning and Visual Question Answering
    Yang, Xu
    Gao, Chongyang
    Zhang, Hanwang
    Cai, Jianfei
    2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 2177 - 2187
  • [5] Relation-Aware Image Captioning for Explainable Visual Question Answering
    Tseng, Ching-Shan
    Lin, Ying-Jia
    Kao, Hung-Yu
    2022 INTERNATIONAL CONFERENCE ON TECHNOLOGIES AND APPLICATIONS OF ARTIFICIAL INTELLIGENCE, TAAI, 2022, : 149 - 154
  • [6] Image Captioning and Visual Question Answering Based on Attributes and External Knowledge
    Wu, Qi
    Shen, Chunhua
    Wang, Peng
    Dick, Anthony
    van den Hengel, Anton
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (06) : 1367 - 1381
  • [7] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
    Dong, Xuanyi
    Zhu, Linchao
    Zhang, De
    Yang, Yi
    Wu, Fei
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 54 - 62
  • [8] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [9] Relation-Aware Image Captioning with Hybrid-Attention for Explainable Visual Question Answering
    Lin, Ying-Jia
    Tseng, Ching-Shan
    Kao, Hung-Yu
    JOURNAL OF INFORMATION SCIENCE AND ENGINEERING, 2024, 40 (03) : 649 - 659
  • [10] Image captioning for effective use of language models in knowledge-based visual question answering
    Salaberria, Ander
    Azkune, Gorka
    Lacalle, Oier Lopez de
    Soroa, Aitor
    Agirre, Eneko
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212