An analysis of graph convolutional networks and recent datasets for visual question answering

被引:0
|
作者
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
机构
[1] Beijing Institute of Technology,School of Computer Science and Technology
[2] South-East Information Technology Institute of Beijing Institute of Technology,undefined
[3] Beijing Engineering Research Centre of High Volume Language Information Processing and Cloud Computing Application,undefined
[4] National Biotechnology Development Agency,undefined
来源
Artificial Intelligence Review | 2022年 / 55卷
关键词
Computer vision; NLP; VQA; GCN; Datasets;
D O I
暂无
中图分类号
学科分类号
摘要
Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.
引用
收藏
页码:6277 / 6300
页数:23
相关论文
共 50 条
  • [1] An analysis of graph convolutional networks and recent datasets for visual question answering
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6277 - 6300
  • [2] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    Multimedia Tools and Applications, 2022, 81 : 40361 - 40370
  • [3] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40361 - 40370
  • [4] Object-difference drived graph convolutional networks for visual question answering
    Xi Zhu
    Zhendong Mao
    Zhineng Chen
    Yangyang Li
    Zhaohui Wang
    Bin Wang
    Multimedia Tools and Applications, 2021, 80 : 16247 - 16265
  • [5] Object-difference drived graph convolutional networks for visual question answering
    Zhu, Xi
    Mao, Zhendong
    Chen, Zhineng
    Li, Yangyang
    Wang, Zhaohui
    Wang, Bin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16247 - 16265
  • [6] Bilinear Graph Networks for Visual Question Answering
    Guo, Dalu
    Xu, Chang
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034
  • [7] Question Answering by Reasoning Across Documents with Graph Convolutional Networks
    De Cao, Nicola
    Aziz, Wilker
    Titov, Ivan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2306 - 2317
  • [8] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [9] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [10] Graph neural networks for visual question answering: a systematic review
    Abdulganiyu Abdu Yusuf
    Chong Feng
    Xianling Mao
    Ramadhani Ally Duma
    Mohammed Salah Abood
    Abdulrahman Hamman Adama Chukkol
    Multimedia Tools and Applications, 2024, 83 : 55471 - 55508