Relation-Aware Image Captioning for Explainable Visual Question Answering

被引:1
|
作者
Tseng, Ching-Shan [1 ]
Lin, Ying-Jia [1 ]
Kao, Hung-Yu [1 ]
机构
[1] Natl Cheng Kung Univ, Dept Comp Sci & Informat Engn, Intelligent Knowledge Management Lab, Tainan, Taiwan
关键词
visual question answering; image captioning; explainable VQA; cross-modality learning; multi-task learning;
D O I
10.1109/TAAI57707.2022.00035
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent studies leveraging object detection models for Visual Question Answering (VQA) ignore the correlations or interactions between multiple objects. In addition, the previous VQA models are black boxes for human beings, which means it is difficult to explain why a model returns correct or wrong answers. To solve the problems above, we propose a new model structure with image captioning for the VQA task. Our model constructs a relation graph according to the relative positions between region pairs and then produces relation-aware visual features with a relation encoder. To make the predictions explainable, we introduce an image captioning module and conduct a multi-task training process. In the meantime, the generated captions are injected into the predictor to assist cross-modal understanding. Experiments show that our model can generate meaningful answers and explanations according to the questions and images. Besides, the relation encoder and the caption-attended predictor lead to improvement for different types of questions.
引用
收藏
页码:149 / 154
页数:6
相关论文
共 50 条
  • [21] Improving Complex Knowledge Base Question Answering with Relation-Aware Subgraph Retrieval and Reasoning Network
    Luo, Dan
    Sheng, Jiawei
    Xu, Hongbo
    Wang, Lihong
    Wang, Bin
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [22] Relation-aware attention for video captioning via graph learning
    Tu, Yunbin
    Zhou, Chang
    Guo, Junjun
    Li, Huafeng
    Gao, Shengxiang
    Yu, Zhengtao
    PATTERN RECOGNITION, 2023, 136
  • [23] Semantic Relation-aware Difference Representation Learning for Change Captioning
    Tu, Yunbin
    Yao, Tingting
    Li, Liang
    Lou, Jiedong
    Gao, Shengxiang
    Yu, Zhengtao
    Yan, Chenggang
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 63 - 73
  • [24] Question-Directed Reasoning With Relation-Aware Graph Attention Network for Complex Question Answering Over Knowledge Graph
    Zhang, Geng
    Liu, Jin
    Zhou, Guangyou
    Zhao, Kunsong
    Xie, Zhiwen
    Huang, Bo
    IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2024, 32 : 1915 - 1927
  • [25] Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering
    Dong, Xuanyi
    Zhu, Linchao
    Zhang, De
    Yang, Yi
    Wu, Fei
    PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 54 - 62
  • [26] Region-Object Relation-Aware Dense Captioning via Transformer
    Shao, Zhuang
    Han, Jungong
    Marnerides, Demetris
    Debattista, Kurt
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [27] A relation-aware representation approach for the question matching system
    Chen, Yanmin
    Chen, Enhong
    Zhang, Kun
    Liu, Qi
    Sun, Ruijun
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2024, 27 (02):
  • [28] A relation-aware representation approach for the question matching system
    Yanmin Chen
    Enhong Chen
    Kun Zhang
    Qi Liu
    Ruijun Sun
    World Wide Web, 2024, 27
  • [29] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [30] Image captioning for effective use of language models in knowledge-based visual question answering
    Salaberria, Ander
    Azkune, Gorka
    Lacalle, Oier Lopez de
    Soroa, Aitor
    Agirre, Eneko
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 212