EKGRL: Entity-Based Knowledge Graph Representation Learning for Fact-Based Visual Question Answering

被引:0
作者
Ren, Yongjian [1 ,2 ]
Chen, Xiaotang [1 ,2 ,3 ]
Huang, Kaiqi [1 ,2 ,3 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Excellence Brain Sci & Intelligence Technol, Beijing, Peoples R China
来源
PATTERN RECOGNITION AND COMPUTER VISION, PRCV 2023, PT VI | 2024年 / 14430卷
关键词
FVQA; Representation Learning; Psychological Knowledge;
D O I
10.1007/978-981-99-8537-1_39
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Fact-based Visual Question Answering (FVQA) is a task aiming at answering question based on given image and external knowledge associated with it. The reasoning abilities of current FVQA models including query-based and joint learning methods are insufficient. To achieve stronger reasoning ability, we propose an entity-based knowledge graph representation learning (EKGRL) method. Our model achieves state-of-the-art performance on FVQA dataset. Furthermore, we build a psychological fact-based VQA dataset (PFVQA) containing 6129 questions from six different types, which is, as far as we know, the first VQA dataset built on psychological knowledge. We demonstrate that EKGRL continues to achieve state-of-the-art performance on PFVQA, showing the ability to maintain a good performance on reasoning and knowledge representation based on external knowledge from both commonsense and psychological domains.
引用
收藏
页码:485 / 496
页数:12
相关论文
共 25 条
  • [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
    Anderson, Peter
    He, Xiaodong
    Buehler, Chris
    Teney, Damien
    Johnson, Mark
    Gould, Stephen
    Zhang, Lei
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
  • [2] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
  • [3] Bordes A., 2013, ADV NEURAL INFORM PR, V2013, P2787, DOI DOI 10.5555/2999792.2999923
  • [4] Chen Z., 2021, Zero-shot visual question answering using knowledge graph
  • [5] Fukui A., 2016, ARXIV160601847
  • [6] Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
    Goyal, Yash
    Khot, Tejas
    Summers-Stay, Douglas
    Batra, Dhruv
    Parikh, Devi
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6325 - 6334
  • [7] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [8] Hochreiter S., 1995, Long short term memory
  • [9] Answer-Type Prediction for Visual Question Answering
    Kafle, Kushal
    Kanan, Christopher
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 4976 - 4984
  • [10] Kim JH, 2018, ADV NEUR IN, V31