Dynamic Co-attention Network for Visual Question Answering

被引：2

作者：

Ebaid, Doaa B. ^{[1
]}

Madbouly, Magda M. ^{[1
]}

El-Zoghabi, Adel A. ^{[1
]}

机构：

[1] Alexandria Univ, Dept Informat Technol, Inst Grad Studies & Res, Alexandria, Egypt

来源：

2021 8TH INTERNATIONAL CONFERENCE ON SOFT COMPUTING & MACHINE INTELLIGENCE (ISCMI 2021) | 2021年

关键词：

visual question answering (VQA); attention; coattention; multi-model attention; dynamic routing;

D O I：

10.1109/ISCMI53840.2021.9654812

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual Question Answering (VQA) embraces driving answers to questions about a particular image. It requires a fine-grained understanding of both the image content and the question content. Recently, VQA models use attention mechanism to find appropriate visual features based on the given question, this requires multi-step inference. The majority of the present work focus on the visual attention only and ignores the role of the textual attention in VQA. In this paper, we propose a dynamic capsule co-attention (CapsCoAtt) where the visual and textual features are treated as capsules and the attention weights are obtained through an iterative process inspired by the capsule network (CapsNet). In addition to that, to achieve a deep understanding for the questions, we propose a hierarchal question representation through three levels. We evaluate the proposed model on the benchmark dataset VQA 2.0. The results show a significant improvement in the VQA performance with the lowest number of parameters compared with the baseline models.

引用

页码：125 / 129

页数：5

共 50 条

[1] Co-Attention Network With Question Type for Visual Question Answering
Yang, Chao
Jiang, Mengqi
Jiang, Bin
Zhou, Weixin
Li, Keqin
IEEE ACCESS, 2019, 7 : 40771 - 40781
[2] Co-attention Network for Visual Question Answering Based on Dual Attention
Dong, Feng
Wang, Xiaofeng
Oad, Ammar
Talpur, Mir Sajjad Hussain
Journal of Engineering Science and Technology Review, 2021, 14 (06) : 116 - 123
[3] Co-attention graph convolutional network for visual question answering
Liu, Chuan
Tan, Ying-Ying
Xia, Tian-Tian
Zhang, Jiajing
Zhu, Ming
MULTIMEDIA SYSTEMS, 2023, 29 (05) : 2527 - 2543
[4] Co-attention graph convolutional network for visual question answering
Chuan Liu
Ying-Ying Tan
Tian-Tian Xia
Jiajing Zhang
Ming Zhu
Multimedia Systems, 2023, 29 : 2527 - 2543
[5] Multi-Channel Co-Attention Network for Visual Question Answering
Tian, Weidong
He, Bin
Wang, Nanxun
Zhao, Zhongqiu
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[6] Hierarchical Question-Image Co-Attention for Visual Question Answering
Lu, Jiasen
Yang, Jianwei
Batra, Dhruv
Parikh, Devi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[7] Deep Modular Co-Attention Networks for Visual Question Answering
Yu, Zhou
Yu, Jun
Cui, Yuhao
Tao, Dacheng
Tian, Qi
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 6274 - 6283
[8] An Effective Dense Co-Attention Networks for Visual Question Answering
He, Shirong
Han, Dezhi
SENSORS, 2020, 20 (17) : 1 - 15
[9] Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering
Lao, Mingrui
Guo, Yanming
Wang, Hui
Zhang, Xin
IEEE ACCESS, 2018, 6 : 31516 - 31524
[10] Bi-direction Co-Attention Network on Visual Question Answering for Blind People
Tung Le
Thong Bui
Huy Tien Nguyen
Minh Le Nguyen
FOURTEENTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2021), 2022, 12084

← 1 2 3 4 5 →