共 35 条
- [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
- [2] [Anonymous], 1997, Neural Computation
- [3] Bahdanau D, 2016, Arxiv, DOI [arXiv:1409.0473, DOI 10.48550/ARXIV.1409.0473]
- [4] Look and Think Twice: Capturing Top-Down Visual Attention with Feedback Convolutional Neural Networks [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2956 - 2964
- [5] SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6298 - 6306
- [6] Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 8299 - 8308
- [7] DENKOWSKI M., 2014, P 9 WORKSH STAT MACH
- [9] Graves A, 2013, INT CONF ACOUST SPEE, P6645, DOI 10.1109/ICASSP.2013.6638947
- [10] Jaderberg M, 2015, ADV NEUR IN, V28