Explicit Knowledge Integration for Knowledge-Aware Visual Question Answering about Named Entities

被引:2
|
作者
Adjali, Omar [1 ]
Grimal, Paul [1 ]
Ferret, Olivier [1 ]
Ghannay, Sahar [2 ]
Le Borgne, Herve [1 ]
机构
[1] Univ Paris Saclay, CEA List, F-91120 Palaiseau, France
[2] Univ Paris Saclay, CNRS, LISN, Palaiseau, France
关键词
Multimedia retrieval; Knowledge injection;
D O I
10.1145/3591106.3592227
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent years have shown unprecedented growth of interest in Vision-Language related tasks, with the need to address the inherent challenges of integrating linguistic and visual information to solve real-world applications. Such a typical task is Visual Question Answering (VQA), which aims to answer questions about visual content. The limitations of the VQA task in terms of question redundancy and poor linguistic variability encouraged researchers to propose Knowledge-aware Visual Question Answering tasks as a natural extension of VQA. In this paper, we tackle the KVQAE (Knowledge-based Visual Question Answering about named Entities) task, which proposes to answer questions about named entities defined in a knowledge base and grounded in visual content. In particular, besides the textual and visual information, we propose to leverage the structural information extracted from syntactic dependency trees and external knowledge graphs to help answer questions about a large spectrum of entities of various types. Thus, by combining contextual and graph-based representations using Graph Convolutional Networks (GCNs), we are able to learn meaningful embeddings for Information Retrieval tasks. Experiments on the ViQuAE public dataset show how our approach improves the state-of-the-art baselines while demonstrating the interest of injecting external knowledge to enhance multimodal information retrieval.
引用
收藏
页码:29 / 38
页数:10
相关论文
共 50 条
  • [1] KVQA: Knowledge-Aware Visual Question Answering
    Shah, Sanket
    Mishra, Anand
    Yadati, Naganand
    Talukdar, Partha Pratim
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8876 - 8884
  • [2] ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
    Lerner, Paul
    Ferret, Olivier
    Guinaudeau, Camille
    Le Borgne, Herve
    Besancon, Romaric
    Moreno, Jose G.
    Melgarejo, Jesus Lovon
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 3108 - 3120
  • [3] Gathering Knowledge for Question Answering Beyond Named Entities
    Przybyla, Piotr
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS, NLDB 2015, 2015, 9103 : 412 - 417
  • [4] VQA as a factoid question answering problem: A novel approach for knowledge-aware and explainable visual question answering
    Narayanan, Abhishek
    Rao, Abijna
    Prasad, Abhishek
    Natarajan, S.
    IMAGE AND VISION COMPUTING, 2021, 116
  • [5] NEWSKVQA: Knowledge-Aware News Video Question Answering
    Gupta, Pranay
    Gupta, Manish
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2022, PT III, 2022, 13282 : 3 - 15
  • [6] KnowReQA: A Knowledge-aware Retrieval Question Answering System
    Wang, Chuanrui
    Bai, Jun
    Zhang, Xiaofeng
    Yan, Cen
    Ouyang, Yuanxin
    Rong, Wenge
    Xiong, Zhang
    KNOWLEDGE SCIENCE, ENGINEERING AND MANAGEMENT, PT I, 2022, 13368 : 709 - 721
  • [7] Improving Knowledge-Aware Dialogue Generation via Knowledge Base Question Answering
    Wang, Jian
    Liu, Junhao
    Bi, Wei
    Liu, Xiaojiang
    He, Kejing
    Xu, Ruifeng
    Yang, Min
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 9169 - 9176
  • [8] Knowledge-aware adaptive graph network for commonsense question answering
    Kang, Long
    Li, Xiaoge
    An, Xiaochun
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2024, 62 (05) : 1305 - 1324
  • [9] Knowledge-aware image understanding with multi-level visual representation enhancement for visual question answering
    Yan, Feng
    Li, Zhe
    Silamu, Wushour
    Li, Yanbing
    MACHINE LEARNING, 2024, 113 (06) : 3789 - 3805
  • [10] Knowledge-aware image understanding with multi-level visual representation enhancement for visual question answering
    Feng Yan
    Zhe Li
    Wushour Silamu
    Yanbing Li
    Machine Learning, 2024, 113 : 3789 - 3805