Object-difference drived graph convolutional networks for visual question answering

被引：23

作者：

Zhu, Xi ^{[1
,2
]}

Mao, Zhendong ^{[3
]}

Chen, Zhineng ^{[4
]}

Li, Yangyang ^{[5
]}

Wang, Zhaohui ^{[1
,2
]}

Wang, Bin ^{[6
]}

机构：

[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China

[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China

[3] Univ Sci & Technol China, Hefei, Peoples R China

[4] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[5] China Acad Elect & Informat Technol, Beijing, Peoples R China

[6] Xiaomi Inc, Xiaomi AI Lab, Beijing, Peoples R China

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2021年 / 80卷 / 11期

基金：

中国国家自然科学基金;

关键词：

Visual question answering; Graph convolutional networks; Object-difference;

D O I：

10.1007/s11042-020-08790-0

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.

引用

页码：16247 / 16265

页数：19

共 50 条

[1] Object-difference drived graph convolutional networks for visual question answering
Xi Zhu
Zhendong Mao
Zhineng Chen
Yangyang Li
Zhaohui Wang
Bin Wang
Multimedia Tools and Applications, 2021, 80 : 16247 - 16265
[2] Object-Difference Attention: A Simple Relational Attention for Visual Question Answering
Wu, Chenfei
Liu, Jinlai
Wang, Xiaojie
Dong, Xuan
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18), 2018, : 519 - 527
[3] An analysis of graph convolutional networks and recent datasets for visual question answering
Yusuf, Abdulganiyu Abdu
Feng Chong
Mao Xianling
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (08) : 6277 - 6300
[4] An analysis of graph convolutional networks and recent datasets for visual question answering
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
Artificial Intelligence Review, 2022, 55 : 6277 - 6300
[5] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Abdulganiyu Abdu Yusuf
Feng Chong
Mao Xianling
Multimedia Tools and Applications, 2022, 81 : 40361 - 40370
[6] Evaluation of graph convolutional networks performance for visual question answering on reasoning datasets
Yusuf, Abdulganiyu Abdu
Feng Chong
Mao Xianling
MULTIMEDIA TOOLS AND APPLICATIONS, 2022, 81 (28) : 40361 - 40370
[7] Bilinear Graph Networks for Visual Question Answering
Guo, Dalu
Xu, Chang
Tao, Dacheng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) : 1023 - 1034
[8] Question Answering by Reasoning Across Documents with Graph Convolutional Networks
De Cao, Nicola
Aziz, Wilker
Titov, Ivan
2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, 2019, : 2306 - 2317
[9] Co-attention graph convolutional network for visual question answering
Liu, Chuan
Tan, Ying-Ying
Xia, Tian-Tian
Zhang, Jiajing
Zhu, Ming
MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
[10] Co-attention graph convolutional network for visual question answering
Chuan Liu
Ying-Ying Tan
Tian-Tian Xia
Jiajing Zhang
Ming Zhu
Multimedia Systems, 2023, 29 : 2527 - 2543

← 1 2 3 4 5 →