共 53 条
[21]
Towards Unsupervised Image Captioning with Shared Multimodal Embeddings
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:7413-7423
[22]
Lample Guillaume, 2018, C TRACK P
[23]
Lee D.-H., 2013, Workshop on Challenges in Representation Learning, ICML, V3, P881
[25]
Adding Chinese Captions to Images
[J].
ICMR'16: PROCEEDINGS OF THE 2016 ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL,
2016,
:271-275
[26]
Microsoft COCO: Common Objects in Context
[J].
COMPUTER VISION - ECCV 2014, PT V,
2014, 8693
:740-755
[27]
Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:3242-3250
[28]
X-Linear Attention Networks for Image Captioning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10968-10977
[29]
CRA-Net: Composed Relation Attention Network for Visual Question Answering
[J].
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19),
2019,
:1202-1210
[30]
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2641-2649