Object-difference drived graph convolutional networks for visual question answering

被引:23
|
作者
Zhu, Xi [1 ,2 ]
Mao, Zhendong [3 ]
Chen, Zhineng [4 ]
Li, Yangyang [5 ]
Wang, Zhaohui [1 ,2 ]
Wang, Bin [6 ]
机构
[1] Chinese Acad Sci, Inst Informat Engn, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Cyber Secur, Beijing, Peoples R China
[3] Univ Sci & Technol China, Hefei, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[5] China Acad Elect & Informat Technol, Beijing, Peoples R China
[6] Xiaomi Inc, Xiaomi AI Lab, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Visual question answering; Graph convolutional networks; Object-difference;
D O I
10.1007/s11042-020-08790-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Visual Question Answering(VQA), an important task to evaluate the cross-modal understanding capability of an Artificial Intelligence model, has been a hot research topic in both computer vision and natural language processing communities. Recently, graph-based models have received growing interest in VQA, for its potential of modeling the relationships between objects as well as its formidable interpretability. Nonetheless, those solutions mainly define the similarity between objects as their semantical relationships, while largely ignoring the critical point that the difference between objects can provide more information for establishing the relationship between nodes in the graph. To achieve this, we propose an object-difference based graph learner, which learns question-adaptive semantic relations by calculating inter-object difference under the guidance of questions. With the learned relationships, the input image can be represented as an object graph encoded with structural dependencies between objects. In addition, existing graph-based models leverage the pre-extracted object boxes by the object detection model as node features for convenience, but they are suffering from the redundancy problem. To reduce the redundant objects, we introduce a soft-attention mechanism to magnify the question-related objects. Moreover, we incorporate our object-difference based graph learner into the soft-attention based Graph Convolutional Networks to capture question-specific objects and their interactions for answer prediction. Our experimental results on the VQA 2.0 dataset demonstrate that our model gives significantly better performance than baseline methods.
引用
收藏
页码:16247 / 16265
页数:19
相关论文
共 50 条
  • [21] Differential Networks for Visual Question Answering
    Wu, Chenfei
    Liu, Jinlai
    Wang, Xiaojie
    Li, Ruifan
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 8997 - 9004
  • [22] Spot the Difference: Difference Visual Question Answering with Residual Alignment
    Lu, Zilin
    Xie, Yutong
    Zeng, Qingjie
    Lu, Mengkang
    Wu, Qi
    Xia, Yong
    MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION - MICCAI 2024, PT V, 2024, 15005 : 649 - 658
  • [23] Advancing Vietnamese Visual Question Answering with Transformer and Convolutional
    Nguyen, Ngoc Son
    Nguyen, Van Son
    Le, Tung
    COMPUTERS & ELECTRICAL ENGINEERING, 2024, 119
  • [24] A Modular Neurosymbolic Approach for Visual Graph Question Answering
    Eiter, Thomas
    Ruiz, Nelson Higuera
    Oetsch, Johannes
    NEURAL-SYMBOLIC LEARNING AND REASONING 2023, NESY 2023, 2023,
  • [25] Scene Graph Refinement Network for Visual Question Answering
    Qian, Tianwen
    Chen, Jingjing
    Chen, Shaoxiang
    Wu, Bo
    Jiang, Yu-Gang
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 3950 - 3961
  • [26] Graph-Structured Representations for Visual Question Answering
    Teney, Damien
    Liu, Lingqiao
    van den Hengel, Anton
    30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 3233 - 3241
  • [27] Dynamic dual graph networks for textbook question answering
    Wang, Yaxian
    Liu, Jun
    Ma, Jie
    Zeng, Hongwei
    Zhang, Lingling
    Li, Junjun
    PATTERN RECOGNITION, 2023, 139
  • [28] Multi-Granularity Temporal Knowledge Graph Question Answering Based on Data Augmentation and Convolutional Networks
    Lu, Yizhi
    Su, Lei
    Wu, Liping
    Jiang, Di
    APPLIED SCIENCES-BASEL, 2025, 15 (06):
  • [29] Learning Visual Knowledge Memory Networks for Visual Question Answering
    Su, Zhou
    Zhu, Chen
    Dong, Yinpeng
    Cai, Dongqi
    Chen, Yurong
    Li, Jianguo
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7736 - 7745
  • [30] Learning Conditioned Graph Structures for Interpretable Visual Question Answering
    Norcliffe-Brown, Will
    Vafeias, Efstathios
    Parisot, Sarah
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31