An analysis of graph convolutional networks and recent datasets for visual question answering

被引：26

作者：

Yusuf, Abdulganiyu Abdu ^{[1
,4
]}

Feng Chong ^{[1
,2
]}

Mao Xianling ^{[1
,3
]}

机构：

[1] Beijing Inst Technol, Sch Comp Sci & Technol, Beijing, Peoples R China

[2] Beijing Inst Technol, South East Informat Technol Inst, Beijing, Peoples R China

[3] Beijing Engn Res Ctr High Volume Language Informa, Beijing, Peoples R China

[4] Natl Biotechnol Dev Agcy, Abuja, Nigeria

来源：

ARTIFICIAL INTELLIGENCE REVIEW | 2022年 / 55卷 / 08期

基金：

国家重点研发计划;

关键词：

Computer vision; NLP; VQA; GCN; Datasets; LANGUAGE;

D O I：

10.1007/s10462-022-10151-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Graph neural network is a deep learning approach widely applied on structural and non-structural scenarios due to its substantial performance and interpretability recently. In a non-structural scenario, textual and visual research topics like visual question answering (VQA) are important, which need graph reasoning models. VQA aims to build a system that can answer related questions about given images as well as understand the underlying semantic meaning behind the image. The critical issues in VQA are to effectively extract the visual and textual features and subject both features into a common space. These issues have a great impact in handling goal-driven, reasoning, and scene classification subtasks. In the same vein, it is difficult to compare models' performance because most existing datasets do not group instances into meaningful categories. With the recent advances in graph-based models, lots of efforts have been devoted to solving the problems mentioned above. This study focuses on graph convolutional networks (GCN) studies and recent datasets for visual question answering tasks. Specifically, we reviewed current related studies on GCN for the VQA task. Also, 18 common and recent datasets for VQA are well studied, though not all of them are discussed at the same level of detail. A critical review of GCN, datasets and VQA challenges is further highlighted. Finally, this study will help researchers to choose a suitable dataset for a particular VQA subtask, identify VQA challenges, the pros and cons of its approaches, and improve more on GCN for the VQA.

引用

页码：6277 / 6300

页数：24

共 50 条

[1] An analysis of graph convolutional networks and recent datasets for visual question answering
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
Artificial Intelligence Review, 2022, 55 : 6277 - 6300
[2] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
Multimedia Tools and Applications, 2022, 81 : 40361 - 40370
[3] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Yusuf, Abdulganiyu Abdu
Feng Chong
Mao Xianling
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40361 - 40370
[4] Object-difference drived graph convolutional networks for visual question answering
Xi Zhu
Zhendong Mao
Zhineng Chen
Yangyang Li
Zhaohui Wang
Bin Wang
Multimedia Tools and Applications, 2021, 80 : 16247 - 16265
[5] Object-difference drived graph convolutional networks for visual question answering
Zhu, Xi
Mao, Zhendong
Chen, Zhineng
Li, Yangyang
Wang, Zhaohui
Wang, Bin
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (11) : 16247 - 16265
[6] Bilinear Graph Networks for Visual Question Answering
Guo, Dalu
Xu, Chang
Tao, Dacheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034
[7] Question Answering by Reasoning Across Documents with Graph Convolutional Networks
De Cao, Nicola
Aziz, Wilker
Titov, Ivan
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2306 - 2317
[8] Co-attention graph convolutional network for visual question answering
Liu, Chuan
Tan, Ying-Ying
Xia, Tian-Tian
Zhang, Jiajing
Zhu, Ming
MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
[9] Co-attention graph convolutional network for visual question answering
Chuan Liu
Ying-Ying Tan
Tian-Tian Xia
Jiajing Zhang
Ming Zhu
Multimedia Systems, 2023, 29 : 2527 - 2543
[10] Graph neural networks for visual question answering: a systematic review
Abdulganiyu Abdu Yusuf
Chong Feng
Xianling Mao
Ramadhani Ally Duma
Mohammed Salah Abood
Abdulrahman Hamman Adama Chukkol
Multimedia Tools and Applications, 2024, 83 : 55471 - 55508

← 1 2 3 4 5 →