Knowledge-Augmented Visual Question Answering With Natural Language Explanation

被引:4
|
作者
Xie, Jiayuan [1 ]
Cai, Yi [2 ,3 ]
Chen, Jiali [2 ,3 ]
Xu, Ruohang [2 ,3 ]
Wang, Jiexin [2 ,3 ]
Li, Qing [1 ]
机构
[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China
[2] South China Univ Technol, Sch Software Engn, Guangzhou 510006, Peoples R China
[3] South China Univ Technol, Key Lab Big Data & Intelligent Robot, Minist Educ, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Task analysis; Visualization; Feature extraction; Question answering (information retrieval); Iterative methods; Predictive models; Natural languages; Visual question answering; natural language explanation; multimodal;
D O I
10.1109/TIP.2024.3379900
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.
引用
收藏
页码:2652 / 2664
页数:13
相关论文
共 50 条
  • [1] Knowledge-Augmented Visual Question Answering With Natural Language Explanation
    Xie, Jiayuan
    Cai, Yi
    Chen, Jiali
    Xu, Ruohang
    Wang, Jiexin
    Li, Qing
    IEEE Transactions on Image Processing, 2024, 33 : 2652 - 2664
  • [2] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
    Jain, Aman
    Kothyari, Mayank
    Kumar, Vishwajeet
    Jyothi, Preethi
    Ramakrishnan, Ganesh
    Chakrabarti, Soumen
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
  • [3] The Second Workshop on Knowledge-Augmented Methods for Natural Language Processing
    Yu, Wenhao
    Tong, Lingbo
    Shi, Weijia
    Peng, Nanyun
    Jiang, Meng
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5899 - 5900
  • [4] Knowledge-Augmented Language Model Verification
    Baek, Jinheon
    Jeong, Soyeong
    Kang, Minki
    Park, Jong C.
    Hwang, Sung Ju
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1720 - 1736
  • [5] Multimodal Natural Language Explanation Generation for Visual Question Answering Based on Multiple Reference Data
    Zhu, He
    Togo, Ren
    Ogawa, Takahiro
    Haseyama, Miki
    ELECTRONICS, 2023, 12 (10)
  • [6] RAVL: A Retrieval-Augmented Visual Language Model Framework for Knowledge-Based Visual Question Answering
    Chai, Naiquan
    Zou, Dongsheng
    Liu, Jiyuan
    Wang, Hao
    Yang, Yuming
    Song, Xinyi
    NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 394 - 406
  • [7] Implicit knowledge-augmented prompting for commonsense explanation generation
    Ge, Yan
    Yu, Hai-Tao
    Lei, Chao
    Liu, Xin
    Jatowt, Adam
    Kim, Kyoung-sook
    Lynden, Steven
    Matono, Akiyoshi
    KNOWLEDGE AND INFORMATION SYSTEMS, 2025, : 3663 - 3698
  • [8] KALA: Knowledge-Augmented Language Model Adaptation
    Kang, Minki
    Baek, Jinheon
    Hwang, Sung Ju
    NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5144 - 5167
  • [9] Leveraging Knowledge Graph Embeddings for Natural Language Question Answering
    Wang, Ruijie
    Wang, Meng
    Liu, Jun
    Chen, Weitong
    Cochez, Michael
    Decker, Stefan
    DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 659 - 675
  • [10] Interactive natural language question answering over knowledge graphs
    Zheng, Weiguo
    Cheng, Hong
    Yu, Jeffrey Xu
    Zou, Lei
    Zhao, Kangfei
    INFORMATION SCIENCES, 2019, 481 : 141 - 159