共 47 条
[1]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[2]
SPICE: Semantic Propositional Image Caption Evaluation
[J].
COMPUTER VISION - ECCV 2016, PT V,
2016, 9909
:382-398
[3]
CaMEL: Mean Teacher Learning for Image Captioning
[J].
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR),
2022,
:4087-4094
[5]
Meshed-Memory Transformer for Image Captioning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10575-10584
[6]
Denkowski MJ, 2014, P 9 WORKSH STAT MACH, P376, DOI DOI 10.3115/V1/W14-3348
[7]
Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
[8]
Dosovitskiy A, 2021, Arxiv, DOI [arXiv:2010.11929, DOI 10.48550/ARXIV.2010.11929]
[10]
Normalized and Geometry-Aware Self-Attention Network for Image Captioning
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10324-10333