共 52 条
- [2] Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 6077 - 6086
- [3] VQA: Visual Question Answering [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433
- [5] Cross-Modal Retrieval in the Cooking Context: Learning Semantic Text-Image Embeddings [J]. ACM/SIGIR PROCEEDINGS 2018, 2018, : 35 - 44
- [6] IMRAM: Iterative Matching with Recurrent Attention Memory for Cross-Modal Image-Text Retrieval [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, : 12652 - 12660
- [7] Chen TL, 2020, AAAI CONF ARTIF INTE, V34, P10583
- [8] Learning to Evaluate Image Captioning [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5804 - 5812
- [9] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
- [10] Diao HW, 2021, AAAI CONF ARTIF INTE, V35, P1218