共 48 条
[2]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[3]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[5]
Ba JL, 2016, arXiv
[7]
Bhakthavatsalam S, 2020, Arxiv, DOI arXiv:2006.07510
[8]
Brown T. B., 2020, P ADV NEUR INF PROC
[9]
Bugliarello E, 2021, Arxiv, DOI arXiv:2011.15124
[10]
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171