共 53 条
[1]
Anderson P., 2017, P 2017 C EMPIRICAL M, P936, DOI 10.18653/v1/D17-1098
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
Artetxe Mikel, 2018, Unsupervised neural machine translation, DOI DOI 10.18653/V1/D18-1399
[4]
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[5]
Video-Based Cross-Modal Recipe Retrieval
[J].
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19),
2019,
:1685-1693
[6]
Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings
[J].
ACM/SIGIR PROCEEDINGS 2018,
2018,
:35-44
[7]
Donahue J, 2015, PROC CVPR IEEE, P2625, DOI 10.1109/CVPR.2015.7298878
[8]
Fang H, 2015, PROC CVPR IEEE, P1473, DOI 10.1109/CVPR.2015.7298754
[9]
Unsupervised Image Captioning
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:4120-4129