CAAN: Context-Aware attention network for visual question answering

被引:47
|
作者
Chen, Chongqing [1 ]
Han, Dezhi [1 ]
Chang, Chin -Chen [2 ]
机构
[1] Shanghai Maritime Univ, Sch Informat Engn, Shanghai 201306, Peoples R China
[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 407, Taiwan
基金
上海市自然科学基金; 中国国家自然科学基金;
关键词
Visual question answering; Attention mechanism; Understanding bias; Absolute position; Contextual information; FUSION;
D O I
10.1016/j.patcog.2022.108980
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Understanding multimodal information is the key to visual question answering (VQA) tasks. Most ex-isting approaches use attention mechanisms to acquire fine-grained information understanding. How-ever, these approaches with merely attention mechanisms do not solve the potential understanding bias problem. Hence, this paper introduces contextual information into VQA for the first time and presents a context-aware attention network (CAAN) to tackle the case. By improving the modular co-attention net-work (MCAN) framework, CAAN's main work includes: designing a novel absolute position calculation method based on the coordinates of each image region in the image and the image's actual size, the position information of all image regions are integrated as contextual information to enhance the visual representation; based on the question itself, several internal contextual information representations are introduced to participate in the modeling of the question words, solving the understanding bias caused by the similarity of the question. Additionally, we also designed two models of different scales, namely CAAN-base and CAAN-large, to explore the effect of the field of view on interaction. Finally, extensive experimental results show that CAAN significantly outperforms MCAN and achieves comparable or even better performance than other state-of-the-art approaches, proving our method can tackle the under-standing bias.(c) 2022 Elsevier Ltd. All rights reserved.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] CAAN: Context-Aware attention network for visual question answering
    Chen, Chongqing
    Han, Dezhi
    Chang, Chin-Chen
    Pattern Recognition, 2022, 132
  • [2] A Context-aware Attention Network for Interactive Question Answering
    Li, Huayu
    Min, Martin Renqiang
    Ge, Yong
    Kadav, Asim
    KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 927 - 935
  • [3] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
    Li, Guohao
    Wang, Xin
    Zhu, Wenwu
    MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
  • [4] Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
    Naik, Nandita
    Potts, Christopher
    Kreiss, Elisa
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2813 - 2817
  • [5] Context-Aware Answer Extraction in Question Answering
    Seonwoo, Yeon
    Kin, Ji-Hoon
    Ha, Jung -Woo
    Oh, Alice
    PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2418 - 2428
  • [6] Graph Attention Network for Context-Aware Visual Tracking
    Shao, Yanyan
    Guo, Dongyan
    Cui, Ying
    Wang, Zhenhua
    Zhang, Liyan
    Zhang, Jianhua
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
  • [7] length Context-aware Multi-level Question Embedding Fusion for visual question answering
    Li, Shengdong
    Gong, Chen
    Zhu, Yuqing
    Luo, Chuanwen
    Hong, Yi
    Lv, Xueqiang
    INFORMATION FUSION, 2024, 102
  • [8] Relation-Aware Graph Attention Network for Visual Question Answering
    Li, Linjie
    Gan, Zhe
    Cheng, Yu
    Liu, Jingjing
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
  • [9] Advanced Visual and Textual Co-context Aware Attention Network with Dependent Multimodal Fusion Block for Visual Question Answering
    Asri H.S.
    Safabakhsh R.
    Multimedia Tools and Applications, 2024, 83 (40) : 87959 - 87986
  • [10] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
    Liu, Yuhang
    Wei, Wei
    Peng, Daowan
    Mao, Xian-Ling
    He, Zhiyong
    Zhou, Pan
    IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357