Image captioning improved visual question answering

被引:0
|
作者
Himanshu Sharma
Anand Singh Jalal
机构
[1] GLA University Mathura,Department of Computer Engineering and Applications
来源
关键词
Visual question answering (VQA); Image captioning; Computer vision (CV); Natural language processing (NLP);
D O I
暂无
中图分类号
学科分类号
摘要
Both Visual Question Answering (VQA) and image captioning are the problems which involve Computer Vision (CV) and Natural Language Processing (NLP) domains. In general, computer vision models are effectively utilized to represent visual contents. While NLP algorithms are used to represent the sentences. In recent years, VQA and image captioning tasks are tackled independently although they require similar type of algorithms. In this paper, a joint relationship between these two tasks is established and exploited. We present an image captioning based VQA model that uses the knowledge learnt from the image captioning task and transfers that knowledge to VQA task. We integrate the image captioning module into the VQA model by fusing the features obtained from captioning model and the attention-based visual feature. The experimental results demonstrate the improvement in the answer generation accuracy by a margin 3.45 % on VQA 1.0, 3.33% on VQA 2.0 and 1.73% on VQA-CP v2 datasets over the state-of-the-art VQA models.
引用
收藏
页码:34775 / 34796
页数:21
相关论文
共 50 条
  • [31] Visual Question Answering A tutorial
    Teney, Damien
    Wu, Qi
    van den Hengel, Anton
    IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 63 - 75
  • [32] MoBVQA: A Modality based Medical Image Visual Question Answering System
    Lubna, A.
    Kalady, Saidalavi
    Lijiya, A.
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 727 - 732
  • [33] Using similarity based image caption to aid visual question answering
    Kang, Joonseo
    Lim, Changwon
    KOREAN JOURNAL OF APPLIED STATISTICS, 2021, 34 (02) : 191 - 204
  • [34] Visual Question Generation as Dual Task of Visual Question Answering
    Li, Yikang
    Duan, Nan
    Zhou, Bolei
    Chu, Xiao
    Ouyang, Wanli
    Wang, Xiaogang
    Zhou, Ming
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6116 - 6124
  • [35] QAVidCap: Enhancing Video Captioning through Question Answering Techniques
    Liu, Hui
    Wan, Xiaojun
    PROCEEDINGS OF THE 4TH ANNUAL ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2024, 2024, : 155 - 164
  • [36] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [37] Transformer Gate Attention Model: An Improved Attention Model for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [38] IMCN: Improved modular co-attention networks for visual question answering
    Liu, Cheng
    Wang, Chao
    Peng, Yan
    APPLIED INTELLIGENCE, 2024, 54 (06) : 5167 - 5182
  • [39] Sign-based image criteria for social interaction visual question answering
    Chuganskaya, Anfisa A.
    Kovalev, Alexey K.
    Panov, Aleksandr, I
    LOGIC JOURNAL OF THE IGPL, 2024, 32 (04) : 656 - 670
  • [40] Customized Image Narrative Generation via Interactive Visual Question Generation and Answering
    Shin, Andrew
    Ushiku, Yoshitaka
    Harada, Tatsuya
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 8925 - 8933