共 44 条
[21]
Leveraging Visual Question Answering for Image-Caption Ranking
[J].
COMPUTER VISION - ECCV 2016, PT II,
2016, 9906
:261-277
[22]
Focus Your Attention: A Bidirectional Focal Attention Network for Image-Text Matching
[J].
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19),
2019,
:3-11
[23]
Cross-modal Moment Localization in Videos
[J].
PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18),
2018,
:843-851
[24]
Lu JS, 2016, ADV NEUR IN, V29
[25]
Mikolov T, 2013, INT C LEARN REPR
[26]
Dual Attention Networks for Multimodal Reasoning and Matching
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:2156-2164
[27]
Rajpurkar Pranav, 2016, P 2016 C EMP METH NA
[29]
Rush Alexander M., 2015, Proc. EMNLP, P379
[30]
Adversarial Representation Learning for Text-to-Image Matching
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:5813-5823