共 26 条
[1]
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:2631-2639
[2]
Query-guided Regression Network with Context Policy for Phrase Grounding
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:824-832
[4]
Visual Dialog
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:1080-1089
[5]
He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]
[6]
Modeling Relationships in Referential Expressions with Compositional Modular Networks
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:4418-4427
[7]
Natural Language Object Retrieval
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:4555-4564
[8]
Inferring and Executing Programs for Visual Reasoning
[J].
2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2017,
:3008-3017
[9]
Microsoft COCO: Common Objects in Context
[J].
COMPUTER VISION - ECCV 2014, PT V,
2014, 8693
:740-755
[10]
Generating Diverse and Meaningful Captions Unsupervised Specificity Optimization for Image Captioning
[J].
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING - ICANN 2018, PT I,
2018, 11139
:176-187