共 40 条
[1]
Anderson Peter, 2017, VISION LANGUAGE NAVI
[2]
[Anonymous], Simple baseline for visual question answering
[3]
[Anonymous], 2018, IEEE INT C ROB AUT I
[4]
Berg T., 2014, EMNLP, P787
[5]
nuScenes: A multimodal dataset for autonomous driving
[J].
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020),
2020,
:11618-11628
[6]
TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
[J].
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019),
2019,
:12530-12539
[7]
Cirik Volkan, 2018, ARXIV180511818
[8]
The Cityscapes Dataset for Semantic Urban Scene Understanding
[J].
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR),
2016,
:3213-3223
[9]
Das Abhishek, 2017, Embodied question answering
[10]
de Vries Harm, 2018, arXiv