MUREL: Multimodal Relational Reasoning for Visual Question Answering

被引:230
作者
Cadene, Remi [1 ]
Ben-younes, Hedi [1 ,2 ]
Cord, Matthieu [1 ]
Thome, Nicolas [3 ]
机构
[1] Sorbonne Univ, CNRS, LIP6, 4 Pl Jussieu, F-75005 Paris, France
[2] Heuritech, 110 Ave Republ, F-75011 Paris, France
[3] Conservatoire Natl Arts & Metiers, F-75003 Paris, France
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00209
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multimodal attentional networks are currently state-of-the-art models for Visual Question Answering (VQA) tasks involving real images. Although attention allows to focus on the visual content relevant to the question, this simple mechanism is arguably insufficient to model complex reasoning features required for VQA or other high-level tasks. In this paper, we propose MuRel, a multimodal relational network which is learned end-to-end to reason over real images. Our first contribution is the introduction of the MuRel cell, an atomic reasoning primitive representing interactions between question and image regions by a rich vectorial representation, and modeling region relations with pairwise combinations. Secondly, we incorporate the cell into a full MuRel network, which progressively refines visual and question interactions, and can be leveraged to define visualization schemes finer than mere attention maps. We validate the relevance of our approach with various ablation studies, and show its superiority to attention-based methods on three datasets: VQA 2.0, VQA-CP v2 and TDIUC. Our final MuRel network is competitive to or outperforms state-of-the-art results in this challenging context.
引用
收藏
页码:1989 / 1998
页数:10
相关论文
共 48 条
[1]   Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering [J].
Agrawal, Aishwarya ;
Batra, Dhruv ;
Parikh, Devi ;
Kembhavi, Aniruddha .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4971-4980
[2]   The nature and fate of natural resins in the geosphere. XII. Investigation of C-ring aromatic diterpenoids in Raritan amber by pyrolysis-GC-matrix isolation FTIR-MS [J].
Anderson, KB .
GEOCHEMICAL TRANSACTIONS, 2006, 7 (02) :1-9
[3]  
[Anonymous], EUR C COMP VIS ECCV
[4]  
[Anonymous], 2018, ARXIV180507932
[5]  
[Anonymous], 2017, 5 INT C LEARN REPR
[6]  
[Anonymous], 2018, IEEE C COMP VIS PATT
[7]  
[Anonymous], IEEE T NEURAL NETWOR
[8]  
[Anonymous], 2017, COMMUN ACM, DOI DOI 10.1145/3065386
[9]  
[Anonymous], 2018, ECCV
[10]  
[Anonymous], 2015, P 3 INT C LEARN REPR