共 48 条
[1]
Anderson P, 2018, PROC CVPR IEEE, P6077, DOI [10.1002/ett.70087, 10.1109/CVPR.2018.00636]
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[6]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]
Dosovitskiy Alexey., 2021, PROC INT C LEARN REP, P2021, DOI [10.48550/ARXIV.2010.11929, DOI 10.48550/ARXIV.2010.11929]
[8]
Every Picture Tells a Story: Generating Sentences from Images
[J].
COMPUTER VISION-ECCV 2010, PT IV,
2010, 6314
:15-+
[9]
Gao CY, 2021, Arxiv, DOI arXiv:2101.03036
[10]
Gao HY, 2019, PR MACH LEARN RES, V97