共 61 条
[1]
2023, Arxiv, DOI [arXiv:2303.08774, DOI 10.48550/ARXIV.2303.08774]
[2]
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:4971-4980
[3]
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
[J].
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2018,
:6077-6086
[4]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[5]
Belinkov Y, 2019, Arxiv, DOI arXiv:1907.04389
[6]
Behind the Scene: Revealing the Secrets of Pre-trained Vision-and-Language Models
[J].
COMPUTER VISION - ECCV 2020, PT VI,
2020, 12351
:565-580
[7]
Chen L, 2023, Arxiv, DOI arXiv:2110.01013
[8]
Counterfactual Samples Synthesizing for Robust Visual Question Answering
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:10797-10806
[9]
UNITER: UNiversal Image-TExt Representation Learning
[J].
COMPUTER VISION - ECCV 2020, PT XXX,
2020, 12375
:104-120
[10]
Clark C, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P4069