Visual Question Answering Research on Joint Knowledge and Visual Information Reasoning

被引:0
|
作者
Su, Zhenqiang [1 ]
Gou, Gang [1 ]
机构
[1] State Key Laboratory of Public Big Data, College of Computer Science and Technology, Guizhou University, Guiyang,550025, China
关键词
Knowledge representation - Semantics - Visual languages;
D O I
10.3778/j.issn.1002-8331.2209-0456
中图分类号
学科分类号
摘要
As a task in the multimodal field, visual question answering requires fusion and reasoning of the features of different modalities, which has important application value. In traditional visual question answering, the answer to the question can be well reasoned only by relying on the visual information of the image. However, pure visual information cannot meet the diverse question-answering needs in real-world scenarios. Knowledge plays an important role in visual question answering and can well assist question answering. Knowledge-based open visual question answering needs to correlate external knowledge to achieve cross-modal scene understanding. In order to better integrate visual information and related external knowledge, a bilinear structure for joint knowledge and visual information reasoning is proposed, and a dual-guided attention module for knowledge representation by image features and question features is designed. Firstly, the model uses the pre-trained vision-language model to obtain the feature representation and visual reasoning information of the question and image, Secondly, the similarity matrix is used to calculate the image object area under the semantic alignment of the question, and then the regional features after the joint alignment of the question features jointly guide the knowledge representation to obtain knowledge reasoning information. Finally, the visual reasoning information and the knowledge reasoning information are fused to get the final answer. The experimental results on the OK-VQA dataset show that the accuracy of the model is 1.97 percentage points and 4.82 percentage points higher than the two baseline methods, respectively, which verifies the effectiveness of the model. © 2016 Chinese Medical Journals Publishing House Co.Ltd. All rights reserved.
引用
收藏
页码:95 / 102
相关论文
共 50 条
  • [1] Improving reasoning with contrastive visual information for visual question answering
    Long, Yu
    Tang, Pengjie
    Wang, Hanli
    Yu, Jian
    ELECTRONICS LETTERS, 2021, 57 (20) : 758 - 760
  • [2] Multimodal Knowledge Reasoning for Enhanced Visual Question Answering
    Hussain, Afzaal
    Maqsood, Ifrah
    Shahzad, Muhammad
    Fraz, Muhammad Moazam
    2022 16TH INTERNATIONAL CONFERENCE ON SIGNAL-IMAGE TECHNOLOGY & INTERNET-BASED SYSTEMS, SITIS, 2022, : 224 - 230
  • [3] Sequential Visual Reasoning for Visual Question Answering
    Liu, Jinlai
    Wu, Chenfei
    Wang, Xiaojie
    Dong, Xuan
    PROCEEDINGS OF 2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS), 2018, : 410 - 415
  • [4] Explicit Knowledge-based Reasoning for Visual Question Answering
    Wang, Peng
    Wu, Qi
    Shen, Chunhua
    Dick, Anthony
    van den Hengel, Anton
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 1290 - 1296
  • [5] Compositional Substitutivity of Visual Reasoning for Visual Question Answering
    Li, Chuanhao
    Li, Zhen
    Jing, Chenchen
    Wu, Yuwei
    Zhai, Mingliang
    Jia, Yunde
    COMPUTER VISION - ECCV 2024, PT XLVIII, 2025, 15106 : 143 - 160
  • [6] PRIOR VISUAL RELATIONSHIP REASONING FOR VISUAL QUESTION ANSWERING
    Yang, Zhuoqian
    Qin, Zengchang
    Yu, Jing
    Wan, Tao
    2020 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2020, : 1411 - 1415
  • [7] Chain of Reasoning for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Dong, Xuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Research on Visual Question Answering Based on GAT Relational Reasoning
    Miao, Yalin
    Cheng, Wenfang
    He, Shuyun
    Jiang, Hui
    NEURAL PROCESSING LETTERS, 2022, 54 (02) : 1435 - 1448
  • [9] Research on Visual Question Answering Based on GAT Relational Reasoning
    Yalin Miao
    Wenfang Cheng
    Shuyun He
    Hui Jiang
    Neural Processing Letters, 2022, 54 : 1435 - 1448
  • [10] Visual question answering by pattern matching and reasoning
    Zhan, Huayi
    Xiong, Peixi
    Wang, Xin
    Yang, Lan
    NEUROCOMPUTING, 2022, 467 : 323 - 336