共 68 条
- [1] Alayrac JB, 2022, ADV NEUR IN
- [2] Anil R, 2023, Arxiv, DOI [arXiv:2305.10403, DOI 10.48550/ARXIV.2305.10403]
- [3] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [4] LaTr: Layout-Aware Transformer for Scene-Text VQA [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 16527 - 16537
- [5] Scene Text Visual Question Answering [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 4290 - 4300
- [6] Brown TB, 2020, ADV NEUR IN, V33
- [7] VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 18009 - 18019
- [8] Chen K, 2016, Arxiv, DOI arXiv:1511.05960
- [9] QUANTUM LONG SHORT-TERM MEMORY [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8622 - 8626
- [10] Chen YC, 2019, AEBMR ADV ECON, V106, P104, DOI 10.1007/978-3-030-58577-8_7