CAAN: Context-Aware attention network for visual question answering

被引：47

作者：

Chen, Chongqing ^{[1
]}

Han, Dezhi ^{[1
]}

Chang, Chin -Chen ^{[2
]}

机构：

[1] Shanghai Maritime Univ, Sch Informat Engn, Shanghai 201306, Peoples R China

[2] Feng Chia Univ, Dept Informat Engn & Comp Sci, Taichung 407, Taiwan

来源：

PATTERN RECOGNITION | 2022年 / 132卷

基金：

上海市自然科学基金; 中国国家自然科学基金;

关键词：

Visual question answering; Attention mechanism; Understanding bias; Absolute position; Contextual information; FUSION;

D O I：

10.1016/j.patcog.2022.108980

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding multimodal information is the key to visual question answering (VQA) tasks. Most ex-isting approaches use attention mechanisms to acquire fine-grained information understanding. How-ever, these approaches with merely attention mechanisms do not solve the potential understanding bias problem. Hence, this paper introduces contextual information into VQA for the first time and presents a context-aware attention network (CAAN) to tackle the case. By improving the modular co-attention net-work (MCAN) framework, CAAN's main work includes: designing a novel absolute position calculation method based on the coordinates of each image region in the image and the image's actual size, the position information of all image regions are integrated as contextual information to enhance the visual representation; based on the question itself, several internal contextual information representations are introduced to participate in the modeling of the question words, solving the understanding bias caused by the similarity of the question. Additionally, we also designed two models of different scales, namely CAAN-base and CAAN-large, to explore the effect of the field of view on interaction. Finally, extensive experimental results show that CAAN significantly outperforms MCAN and achieves comparable or even better performance than other state-of-the-art approaches, proving our method can tackle the under-standing bias.(c) 2022 Elsevier Ltd. All rights reserved.

引用

页数：14

共 50 条

[1] CAAN: Context-Aware attention network for visual question answering
Chen, Chongqing
Han, Dezhi
Chang, Chin-Chen
Pattern Recognition, 2022, 132
[2] A Context-aware Attention Network for Interactive Question Answering
Li, Huayu
Min, Martin Renqiang
Ge, Yong
Kadav, Asim
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, : 927 - 935
[3] Boosting Visual Question Answering with Context-aware Knowledge Aggregation
Li, Guohao
Wang, Xin
Zhu, Wenwu
MM '20: PROCEEDINGS OF THE 28TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, 2020, : 1227 - 1235
[4] Context-VQA: Towards Context-Aware and Purposeful Visual Question Answering
Naik, Nandita
Potts, Christopher
Kreiss, Elisa
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS, ICCVW, 2023, : 2813 - 2817
[5] Context-Aware Answer Extraction in Question Answering
Seonwoo, Yeon
Kin, Ji-Hoon
Ha, Jung -Woo
Oh, Alice
PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), 2020, : 2418 - 2428
[6] Graph Attention Network for Context-Aware Visual Tracking
Shao, Yanyan
Guo, Dongyan
Cui, Ying
Wang, Zhenhua
Zhang, Liyan
Zhang, Jianhua
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024,
[7] length Context-aware Multi-level Question Embedding Fusion for visual question answering
Li, Shengdong
Gong, Chen
Zhu, Yuqing
Luo, Chuanwen
Hong, Yi
Lv, Xueqiang
INFORMATION FUSION, 2024, 102
[8] Relation-Aware Graph Attention Network for Visual Question Answering
Li, Linjie
Gan, Zhe
Cheng, Yu
Liu, Jingjing
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 10312 - 10321
[9] Advanced Visual and Textual Co-context Aware Attention Network with Dependent Multimodal Fusion Block for Visual Question Answering
Asri H.S.
Safabakhsh R.
Multimedia Tools and Applications, 2024, 83 (40) : 87959 - 87986
[10] Depth-Aware and Semantic Guided Relational Attention Network for Visual Question Answering
Liu, Yuhang
Wei, Wei
Peng, Daowan
Mao, Xian-Ling
He, Zhiyong
Zhou, Pan
IEEE TRANSACTIONS ON MULTIMEDIA, 2023, 25 : 5344 - 5357

← 1 2 3 4 5 →