Dynamic Co-attention Network for Visual Question Answering

被引:2
|
作者
Ebaid, Doaa B. [1 ]
Madbouly, Magda M. [1 ]
El-Zoghabi, Adel A. [1 ]
机构
[1] Alexandria Univ, Dept Informat Technol, Inst Grad Studies & Res, Alexandria, Egypt
来源
2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021) | 2021年
关键词
visual question answering (VQA); attention; coattention; multi-model attention; dynamic routing;
D O I
10.1109/ISCMI53840.2021.9654812
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual Question Answering (VQA) embraces driving answers to questions about a particular image. It requires a fine-grained understanding of both the image content and the question content. Recently, VQA models use attention mechanism to find appropriate visual features based on the given question, this requires multi-step inference. The majority of the present work focus on the visual attention only and ignores the role of the textual attention in VQA. In this paper, we propose a dynamic capsule co-attention (CapsCoAtt) where the visual and textual features are treated as capsules and the attention weights are obtained through an iterative process inspired by the capsule network (CapsNet). In addition to that, to achieve a deep understanding for the questions, we propose a hierarchal question representation through three levels. We evaluate the proposed model on the benchmark dataset VQA 2.0. The results show a significant improvement in the VQA performance with the lowest number of parameters compared with the baseline models.
引用
收藏
页码:125 / 129
页数:5
相关论文
共 50 条
  • [1] Co-Attention Network With Question Type for Visual Question Answering
    Yang, Chao
    Jiang, Mengqi
    Jiang, Bin
    Zhou, Weixin
    Li, Keqin
    IEEE ACCESS, 2019, 7 : 40771 - 40781
  • [2] Co-attention Network for Visual Question Answering Based on Dual Attention
    Dong, Feng
    Wang, Xiaofeng
    Oad, Ammar
    Talpur, Mir Sajjad Hussain
    Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
  • [3] Co-attention graph convolutional network for visual question answering
    Liu, Chuan
    Tan, Ying-Ying
    Xia, Tian-Tian
    Zhang, Jiajing
    Zhu, Ming
    MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
  • [4] Co-attention graph convolutional network for visual question answering
    Chuan Liu
    Ying-Ying Tan
    Tian-Tian Xia
    Jiajing Zhang
    Ming Zhu
    Multimedia Systems, 2023, 29 : 2527 - 2543
  • [5] Multi-Channel Co-Attention Network for Visual Question Answering
    Tian, Weidong
    He, Bin
    Wang, Nanxun
    Zhao, Zhongqiu
    2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
  • [6] Hierarchical Question-Image Co-Attention for Visual Question Answering
    Lu, Jiasen
    Yang, Jianwei
    Batra, Dhruv
    Parikh, Devi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [7] Deep Modular Co-Attention Networks for Visual Question Answering
    Yu, Zhou
    Yu, Jun
    Cui, Yuhao
    Tao, Dacheng
    Tian, Qi
    2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6274 - 6283
  • [8] An Effective Dense Co-Attention Networks for Visual Question Answering
    He, Shirong
    Han, Dezhi
    SENSORS, 2020, 20 (17) : 1 - 15
  • [9] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering
    Lao, Mingrui
    Guo, Yanming
    Wang, Hui
    Zhang, Xin
    IEEE ACCESS, 2018, 6 : 31516 - 31524
  • [10] Bi-direction Co-Attention Network on Visual Question Answering for Blind People
    Tung Le
    Thong Bui
    Huy Tien Nguyen
    Minh Le Nguyen
    FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084