共 29 条
[1]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[2]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[3]
Multimodal Encoder-Decoder Attention Networks for Visual Question Answering
[J].
IEEE ACCESS,
2020, 8
:35662-35671
[4]
Exploring Logical Reasoning for Referring Expression Comprehension
[J].
PROCEEDINGS OF THE 29TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2021,
2021,
:5047-5055
[5]
Dauphin YN, 2017, PR MACH LEARN RES, V70
[6]
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:6325-6334
[8]
Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]
[9]
Attention on Attention for Image Captioning
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:4633-4642
[10]
In Defense of Grid Features for Visual Question Answering
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10264-10273