Dynamic Fusion with Intra- and Inter-modality Attention Flow for Visual Question Answering

被引:297
作者
Gao, Peng [1 ]
Jiang, Zhengkai [3 ]
You, Haoxuan [4 ]
Lu, Pan [4 ]
Hoi, Steven [2 ]
Wang, Xiaogang [1 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, CUHK SenseTime Joint Lab, Hong Kong, Peoples R China
[2] Singapore Management Univ, Singapore, Singapore
[3] CASIA, NLPR, Beijing, Peoples R China
[4] Tsinghua Univ, Beijing, Peoples R China
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.00680
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning effective fusion of multi-modality features is at the heart of visual question answering. We propose a novel method of dynamically fuse multi-modal features with intra- and inter-modality information flow, which alternatively pass dynamic information between and across the visual and language modalities. It can robustly capture the high-level interactions between language and vision domains, thus significantly improves the performance of visual question answering. We also show that, the proposed dynamic intra modality attention flow conditioned on the other modality can dynamically modulate the intra-modality attention of the current modality, which is vital for multimodality feature fusion. Experimental evaluations on the VQA 2.0 dataset show that the proposed method achieves the state-of-the-art VQA performance. Extensive ablation studies are carried out for the comprehensive analysis of the proposed method.
引用
收藏
页码:6632 / 6641
页数:10
相关论文
共 48 条
[31]   Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning [J].
Lu, Jiasen ;
Xiong, Caiming ;
Parikh, Devi ;
Socher, Richard .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :3242-3250
[32]  
Lu JH, 2016, PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING TECHNOLOGY (CSET2015), MEDICAL SCIENCE AND BIOLOGICAL ENGINEERING (MSBE2015), P289
[33]  
Luo Y, 2017, IEEE INT SYMP ELEC
[34]   Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction [J].
Noh, Hyeonwoo ;
Seo, Paul Hongsuck ;
Han, Bohyung .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :30-38
[35]  
Paszke A., 2017, AUTOMATIC DIFFERENTI
[36]  
Pennington Jeffrey, 2014, P 2014 C EMP METH NA, P1532
[37]  
Ren S., 2015, ADV NEURAL INFORM PR, P91, DOI DOI 10.1109/TPAMI.2016.2577031
[38]  
Shaw P, 2018, P 2018 C N AM CHAPT, P464, DOI DOI 10.18653/V1/N18-2074
[39]  
Sun S., 2018, Advances in Neural Information Processing Systems, P760
[40]  
Vaswani Ashish, 2017, P 31 INT C NEURAL IN, P5998