共 64 条
- [1] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
- [2] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [3] Grounding Distributional Semantics in the Visual World [J]. LANGUAGE AND LINGUISTICS COMPASS, 2016, 10 (01): : 3 - 13
- [5] Beinborn Lisa, 2018, P 27 INT C COMP LING, P2325
- [6] Bommasani Rishi, 2020, P 58 ANN M ASS COMPU, P4758, DOI [10.18653/v1/2020.acl-main.431, DOI 10.18653/V1/2020.ACL-MAIN.431]
- [8] Bruni Elia., 2012, Proceedings of the 20th ACM International Conference on Multimedia, P1219
- [10] Bugliarello Emanuele, 2021, T ASSOC COMPUT LING, DOI [10.1162/tacl_a_00408, DOI 10.1162/TACL_A_00408]