共 61 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
SPICE: Semantic Propositional Image Caption Evaluation
[J].
COMPUTER VISION - ECCV 2016, PT V,
2016, 9909
:382-398
[3]
[Anonymous], 2012, P 13 C EUR CHAPT ASS
[4]
Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, 10.48550/arXiv.1409.0473,1409.0473, DOI 10.48550/ARXIV.1409.0473,1409.0473]
[5]
StructCap: Structured Semantic Embedding for Image Captioning
[J].
PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17),
2017,
:46-54
[6]
SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:6298-6306
[7]
Meshed-Memory Transformer for Image Captioning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10575-10584
[8]
Fang H, 2015, PROC CVPR IEEE, P1473, DOI 10.1109/CVPR.2015.7298754
[9]
Every Picture Tells a Story: Generating Sentences from Images
[J].
COMPUTER VISION-ECCV 2010, PT IV,
2010, 6314
:15-+
[10]
Aligning Linguistic Words and Visual Semantic Units for Image Captioning
[J].
PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA (MM'19),
2019,
:765-773