TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments

被引:172
作者
Chen, Howard [1 ,4 ]
Suhr, Alane [2 ,3 ]
Misra, Dipendra [2 ,3 ]
Snavely, Noah [2 ,3 ]
Artzi, Yoav [2 ,3 ]
机构
[1] ASAPP Inc, New York, NY 10007 USA
[2] Cornell Univ, Dept Comp Sci, New York, NY 10021 USA
[3] Cornell Univ, Cornell Tech, New York, NY 10021 USA
[4] Cornell Univ, New York, NY 10021 USA
来源
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019) | 2019年
关键词
D O I
10.1109/CVPR.2019.01282
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of jointly reasoning about language and vision through a navigation and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent must first follow navigation instructions in a Street View environment to a goal position, and then guess a location in its observed environment described in natural language to find a hidden object. The data contains 9326 examples of English instructions and spatial descriptions paired with demonstrations. We perform qualitative linguistic analysis, and show that the data displays a rich use of spatial reasoning. Empirical analysis shows the data presents an open challenge to existing methods.
引用
收藏
页码:12530 / 12539
页数:10
相关论文
共 34 条
[1]  
Anderson Peter, 2018, P IEEE C COMP VIS PA
[2]  
[Anonymous], P AAAI C ART INT
[3]  
[Anonymous], 2015, CORR
[4]  
[Anonymous], 2014, PROC COMPUT VIS PATT
[5]  
[Anonymous], 2018, ADV NEURAL INFORM PR
[6]  
[Anonymous], P C ROB LEARN
[7]  
[Anonymous], ADV NEURAL INFORM PR
[8]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[9]  
[Anonymous], P 2017 C EMP METH NA
[10]  
[Anonymous], 2018, P 2018 C EMP METH NA