共 49 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
Neural Module Networks
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:39-48
[3]
Andreas Jacob, 2016, P NAACL HLT, P1545, DOI DOI 10.18653/V1/N16-1181
[4]
[Anonymous], 2018, INT C LEARN REPR
[5]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[6]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[7]
Counting Everyday Objects in Everyday ScenesCounting Everyday Objects in Everyday Scenes
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4428-4437
[8]
Chen D., 2014, P 2014 C EMP METH NA, P740, DOI DOI 10.3115/V1/D14-1082
[9]
Chung J., 2014, EMPIRICAL EVALUATION
[10]
Feng J., ABS160401485