共 41 条
[1]
Hudson DA, 2019, Arxiv, DOI arXiv:1902.09506
[2]
nocaps: novel object captioning at scale
[J].
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019),
2019,
:8947-8956
[3]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[4]
BROWN PF, 1991, 29 ANN M ASS COMPUTA, P169
[5]
Chen WH, 2020, Arxiv, DOI arXiv:1910.03230
[6]
Chen YC, 2020, Arxiv, DOI [arXiv:1909.11740, DOI 10.48550/ARXIV.1909.11740]
[7]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[8]
Frome A., 2013, Advances in neural information processing systems, V26, P2121
[9]
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:6325-6334
[10]
Towards Learning a Generic Agent for Vision-and-Language Navigation via Pre-training
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:13134-13143