Topological Planning with Transformers for Vision-and-Language Navigation

被引:51
作者
Chen, Kevin [1 ]
Chen, Junshen K. [1 ]
Chuang, Jo [1 ]
Vazquez, Marynel [2 ]
Savarese, Silvio [1 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] Yale Univ, New Haven, CT 06520 USA
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
关键词
ROBOT;
D O I
10.1109/CVPR46437.2021.01112
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conventional approaches to vision-and-language navigation (VLN) are trained end-to-end but struggle to perform well in freely traversable environments. Inspired by the robotics community, we propose a modular approach to VLN using topological maps. Given a natural language instruction and topological map, our approach leverages attention mechanisms to predict a navigation plan in the map. The plan is then executed with low-level actions (e.g. FORWARD, ROTATE) using a robust controller. Experiments show that our method outperforms previous end-to-end approaches, generates interpretable navigation plans, and exhibits intelligent behaviors such as backtracking.
引用
收藏
页码:11271 / 11281
页数:11
相关论文
共 56 条
  • [1] Anderson K, 2018, CURR OBSTET GYNECOL, V7, P6, DOI 10.1007/s13669-018-0231-9
  • [2] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Anderson, Peter
    Wu, Qi
    Teney, Damien
    Bruce, Jake
    Johnson, Mark
    Sunderhauf, Niko
    Reid, Ian
    Gould, Stephen
    van den Hengel, Anton
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
  • [3] Anderson Peter, 2019, NeurIPS, P371
  • [4] Anderson Peter, 2020, P C ROBOT LEARNING C
  • [5] Long short-term memory
    Hochreiter, S
    Schmidhuber, J
    [J]. NEURAL COMPUTATION, 1997, 9 (08) : 1735 - 1780
  • [6] Arjun Majumdar, 2020, IMPROVING VISION LAN
  • [7] Battaglia PW, 2018, ARXIV PREPRINT ARXIV
  • [8] Chaplot Devendra, 2020, CVPR
  • [9] Chen K, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
  • [10] Chen T., 2019, PROC INT C LEARN REP