共 92 条
- [1] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [2] Baker C. F., 1998, P 17 INT C COMP LING, DOI DOI 10.3115/980845.980860
- [3] Bengio Y., 2009, P INT C MACH LEARN, P41, DOI [DOI 10.1145/1553374.1553380, 10.1145/1553374.1553380]
- [4] Biesialska Magdalena, 2020, P 28 INT C COMPUTATI, P6523
- [5] Bird Steven, 2009, Nat-ural language processing with Python: analyzing text with the natural language toolkit
- [6] Bravo Maria A, 2022, ARXIV
- [7] Chen K., 2023, ARXIV
- [8] Chen VS, 2019, IEEE I CONF COMP VIS, P2580, DOI [10.1109/ICCV.2019.00267, 10.1109/iccv.2019.00267]
- [9] Chen YY, 2024, Arxiv, DOI arXiv:2309.04461
- [10] Collaborative Transformers for Grounded Situation Recognition [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 19627 - 19636