Knowledge-Augmented Visual Question Answering With Natural Language Explanation

被引：4

作者：

Xie, Jiayuan ^{[1
]}

Cai, Yi ^{[2
,3
]}

Chen, Jiali ^{[2
,3
]}

Xu, Ruohang ^{[2
,3
]}

Wang, Jiexin ^{[2
,3
]}

Li, Qing ^{[1
]}

机构：

[1] Hong Kong Polytech Univ, Dept Comp, Hong Kong, Peoples R China

[2] South China Univ Technol, Sch Software Engn, Guangzhou 510006, Peoples R China

[3] South China Univ Technol, Key Lab Big Data & Intelligent Robot, Minist Educ, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON IMAGE PROCESSING | 2024年 / 33卷

基金：

中国国家自然科学基金;

关键词：

Task analysis; Visualization; Feature extraction; Question answering (information retrieval); Iterative methods; Predictive models; Natural languages; Visual question answering; natural language explanation; multimodal;

D O I：

10.1109/TIP.2024.3379900

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual question answering with natural language explanation (VQA-NLE) is a challenging task that requires models to not only generate accurate answers but also to provide explanations that justify the relevant decision-making processes. This task is accomplished by generating natural language sentences based on the given question-image pair. However, existing methods often struggle to ensure consistency between the answers and explanations due to their disregard of the crucial interactions between these factors. Moreover, existing methods overlook the potential benefits of incorporating additional knowledge, which hinders their ability to effectively bridge the semantic gap between questions and images, leading to less accurate explanations. In this paper, we present a novel approach denoted the knowledge-based iterative consensus VQA-NLE (KICNLE) model to address these limitations. To maintain consistency, our model incorporates an iterative consensus generator that adopts a multi-iteration generative method, enabling multiple iterations of the answer and explanation in each generation. In each iteration, the current answer is utilized to generate an explanation, which in turn guides the generation of a new answer. Additionally, a knowledge retrieval module is introduced to provide potentially valid candidate knowledge, guide the generation process, effectively bridge the gap between questions and images, and enable the production of high-quality answer-explanation pairs. Extensive experiments conducted on three different datasets demonstrate the superiority of our proposed KICNLE model over competing state-of-the-art approaches. Our code is available at https://github.com/Gary-code/KICNLE.

引用

页码：2652 / 2664

页数：13

共 50 条

[1] Knowledge-Augmented Visual Question Answering With Natural Language Explanation
Xie, Jiayuan
Cai, Yi
Chen, Jiali
Xu, Ruohang
Wang, Jiexin
Li, Qing
IEEE Transactions on Image Processing, 2024, 33 : 2652 - 2664
[2] Select, Substitute, Search: A New Benchmark for Knowledge-Augmented Visual Question Answering
Jain, Aman
Kothyari, Mayank
Kumar, Vishwajeet
Jyothi, Preethi
Ramakrishnan, Ganesh
Chakrabarti, Soumen
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2491 - 2498
[3] The Second Workshop on Knowledge-Augmented Methods for Natural Language Processing
Yu, Wenhao
Tong, Lingbo
Shi, Weijia
Peng, Nanyun
Jiang, Meng
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5899 - 5900
[4] Knowledge-Augmented Language Model Verification
Baek, Jinheon
Jeong, Soyeong
Kang, Minki
Park, Jong C.
Hwang, Sung Ju
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 1720 - 1736
[5] Multimodal Natural Language Explanation Generation for Visual Question Answering Based on Multiple Reference Data
Zhu, He
Togo, Ren
Ogawa, Takahiro
Haseyama, Miki
ELECTRONICS, 2023, 12 (10)
[6] RAVL: A Retrieval-Augmented Visual Language Model Framework for Knowledge-Based Visual Question Answering
Chai, Naiquan
Zou, Dongsheng
Liu, Jiyuan
Wang, Hao
Yang, Yuming
Song, Xinyi
NATURAL LANGUAGE PROCESSING AND CHINESE COMPUTING, PT III, NLPCC 2024, 2025, 15361 : 394 - 406
[7] Implicit knowledge-augmented prompting for commonsense explanation generation
Ge, Yan
Yu, Hai-Tao
Lei, Chao
Liu, Xin
Jatowt, Adam
Kim, Kyoung-sook
Lynden, Steven
Matono, Akiyoshi
KNOWLEDGE AND INFORMATION SYSTEMS, 2025, : 3663 - 3698
[8] KALA: Knowledge-Augmented Language Model Adaptation
Kang, Minki
Baek, Jinheon
Hwang, Sung Ju
NAACL 2022: THE 2022 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES, 2022, : 5144 - 5167
[9] Leveraging Knowledge Graph Embeddings for Natural Language Question Answering
Wang, Ruijie
Wang, Meng
Liu, Jun
Chen, Weitong
Cochez, Michael
Decker, Stefan
DATABASE SYSTEMS FOR ADVANCED APPLICATIONS (DASFAA 2019), PT I, 2019, 11446 : 659 - 675
[10] Interactive natural language question answering over knowledge graphs
Zheng, Weiguo
Cheng, Hong
Yu, Jeffrey Xu
Zou, Lei
Zhao, Kangfei
INFORMATION SCIENCES, 2019, 481 : 141 - 159

← 1 2 3 4 5 →