ADAPTIVE ATTENTION FUSION NETWORK FOR VISUAL QUESTION ANSWERING

被引:0
作者
Gu, Geonmo [1 ]
Kim, Seong Tae [1 ]
Ro, Yong Man [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Image & Video Syst Lab, Daejeon, South Korea
来源
2017 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME) | 2017年
基金
新加坡国家研究基金会;
关键词
Visual Question Answering; Visual attention; Textual attention; Adaptive fusion; Deep learning;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Automatic understanding of the content of a reference image and natural language questions is needed in Visual Question Answering (VQA). Generating a visual attention map that focuses on the regions related to the context of the question can improve performance of VQA. In this paper, we propose adaptive attention-based VQA network. The proposed method utilizes the complementary information from the attention maps depending on three levels of word embedding (word level, phrase level, and question level embedding), and adaptively fuses the information to represent the image-question pair appropriately. Comparative experiments have been conducted on the public COCO-QA database to validate the proposed method. Experimental results have shown that the proposed method outperforms previous methods in terms of accuracy.
引用
收藏
页码:997 / 1002
页数:6
相关论文
共 50 条
  • [31] Multi-Tier Attention Network using Term-weighted Question Features for Visual Question Answering
    Manmadhan, Sruthy
    Kovoor, Binsu C.
    IMAGE AND VISION COMPUTING, 2021, 115
  • [32] Re-Attention for Visual Question Answering
    Guo, Wenya
    Zhang, Ying
    Yang, Jufeng
    Yuan, Xiaojie
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2021, 30 : 6730 - 6743
  • [33] Multimodal Encoders and Decoders with Gate Attention for Visual Question Answering
    Li, Haiyan
    Han, Dezhi
    COMPUTER SCIENCE AND INFORMATION SYSTEMS, 2021, 18 (03) : 1023 - 1040
  • [34] Visual question answering model based on graph neural network and contextual attention
    Sharma, Himanshu
    Jalal, Anand Singh
    IMAGE AND VISION COMPUTING, 2021, 110
  • [35] Efficient Multi-step Reasoning Attention Network for Visual Question Answering
    Zhang, Haotian
    Wu, Wei
    Zhang, Meng
    THIRTEENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2021), 2022, 12083
  • [36] Compound-Attention Network with Original Feature injection for visual question and answering
    Chunlei Wu
    Jing Lu
    Haisheng Li
    Jie Wu
    Hailong Duan
    Shaozu Yuan
    Signal, Image and Video Processing, 2021, 15 : 1853 - 1861
  • [37] CRA-Net: Composed Relation Attention Network for Visual Question Answering
    Peng, Liang
    Yang, Yang
    Wang, Zheng
    Wu, Xiao
    Huang, Zi
    PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19), 2019, : 1202 - 1210
  • [38] Compound-Attention Network with Original Feature injection for visual question and answering
    Wu, Chunlei
    Lu, Jing
    Li, Haisheng
    Wu, Jie
    Duan, Hailong
    Yuan, Shaozu
    SIGNAL IMAGE AND VIDEO PROCESSING, 2021, 15 (08) : 1853 - 1861
  • [39] Multi-stage hybrid embedding fusion network for visual question answering
    Lao, Mingrui
    Guo, Yanming
    Pu, Nan
    Chen, Wei
    Liu, Yu
    Lew, Michael S.
    NEUROCOMPUTING, 2021, 423 : 541 - 550
  • [40] Information fusion in visual question answering: A Survey
    Zhang, Dongxiang
    Cao, Rui
    Wu, Sai
    INFORMATION FUSION, 2019, 52 : 268 - 280