共 65 条
- [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
- [2] SPICE: Semantic Propositional Image Caption Evaluation [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 382 - 398
- [3] [Anonymous], 2018, P EUR C COMP VIS ECC, DOI DOI 10.3892/MMR.2018.9013
- [4] [Anonymous], 2020, ARXIV200414255, DOI DOI 10.1145/3397271.3401093
- [5] Picture it in your mind: generating high level visual representations from textual descriptions [J]. INFORMATION RETRIEVAL JOURNAL, 2018, 21 (2-3): : 208 - 229
- [6] Chen Y.-C., 2019, arXiv
- [7] Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8299 - 8308
- [8] Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
- [9] Linking Image and Text with 2-Way Nets [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 1855 - 1865
- [10] Faghri F, 2018, P BRIT MACH VIS C