共 48 条
[1]
Alayrac JB, 2022, ADV NEUR IN
[2]
VQA: Visual Question Answering
[J].
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV),
2015,
:2425-2433
[3]
Awadalla A, 2023, Arxiv, DOI [arXiv:2308.01390, DOI 10.48550/ARXIV.2308.01390]
[4]
Baik S, 2020, ADV NEUR IN, V33
[5]
Ben-Zaken E, 2022, PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022): (SHORT PAPERS), VOL 2, P1
[6]
Brown TB, 2020, ADV NEUR IN, V33
[7]
Cho J, 2021, PR MACH LEARN RES, V139
[8]
Finn C, 2017, PR MACH LEARN RES, V70
[9]
Physically Grounded Vision-Language Models for Robotic Manipulation
[J].
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2024),
2024,
:12462-12469
[10]
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
[J].
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017),
2017,
:6325-6334