An analysis of graph convolutional networks and recent datasets for visual question answering

被引:26
|
作者
Yusuf, Abdulganiyu Abdu [1 ,4 ]
Feng Chong [1 ,2 ]
Mao Xianling [1 ,3 ]
机构
[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China
[2] Beijing Inst Technol, South East Informat Technol Inst, Beijing, Peoples R China
[3] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China
[4] Natl Biotechnol Dev Agcy, Abuja, Nigeria
基金
国家重点研发计划;
关键词
Computer vision; NLP; VQA; GCN; Datasets; LANGUAGE;
D O I
10.1007/s10462-022-10151-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.
引用
收藏
页码:6277 / 6300
页数:24
相关论文
共 50 条
  • [1] An analysis of graph convolutional networks and recent datasets for visual question answering
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    Artificial Intelligence Review, 2022, 55 : 6277 - 6300
  • [2] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Abdulganiyu Abdu Yusuf
    Feng Chong
    Mao Xianling
    Multimedia Tools and Applications, 2022, 81 : 40361 - 40370
  • [3] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
    Yusuf, Abdulganiyu Abdu
    Feng Chong
    Mao Xianling
    MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40361 - 40370
  • [4] Object-difference drived graph convolutional networks for visual question answering
    Xi Zhu
    Zhendong Mao
    Zhineng Chen
    Yangyang Li
    Zhaohui Wang
    Bin Wang
    Multimedia Tools and Applications, 2021, 80 : 16247 - 16265
  • [5] Object-difference drived graph convolutional networks for visual question answering
    Zhu, Xi
    Mao, Zhendong
    Chen, Zhineng
    Li, Yangyang
    Wang, Zhaohui
    Wang, Bin
    MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16247 - 16265
  • [6] Bilinear Graph Networks for Visual Question Answering
    Guo, Dalu
    Xu, Chang
    Tao, Dacheng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034
  • [7] Question Answering by Reasoning Across Documents with Graph Convolutional Networks
    De Cao, Nicola
    Aziz, Wilker
    Titov, Ivan
    2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2306 - 2317
  • [8] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [9] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [10] Graph neural networks for visual question answering: a systematic review
    Abdulganiyu Abdu Yusuf
    Chong Feng
    Xianling Mao
    Ramadhani Ally Duma
    Mohammed Salah Abood
    Abdulrahman Hamman Adama Chukkol
    Multimedia Tools and Applications, 2024, 83 : 55471 - 55508