Curriculum Learning for Vision-and-Language Navigation

被引:0
作者
Zhang, Jiwen [1 ]
Wei, Zhongyu [1 ,2 ]
Fan, Jianqing [1 ,3 ]
Peng, Jiajie [2 ]
机构
[1] Fudan Univ, Sch Data Sci, Shanghai, Peoples R China
[2] Fudan Univ, Res Inst Intelligent & Complex Syst, Shanghai, Peoples R China
[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) is a task where an agent navigates in an embodied indoor environment under human instructions. Previous works ignore the distribution of sample difficulty and we argue that this potentially degrade their agent performance. To tackle this issue, we propose a novel curriculum based training paradigm for VLN tasks that can balance human prior knowledge and agent learning progress about training samples. We develop the principle of curriculum design and re-arrange the benchmark Room-to-Room (R2R) dataset to make it suitable for curriculum training. Experiments show that our method is model-agnostic and can significantly improve the performance, the generalizability, and the training efficiency of current state-of-the-art navigation agents without increasing model complexity.
引用
收藏
页数:12
相关论文
共 39 条
  • [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Anderson, Peter
    Wu, Qi
    Teney, Damien
    Bruce, Jake
    Johnson, Mark
    Sunderhauf, Niko
    Reid, Ian
    Gould, Stephen
    van den Hengel, Anton
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
  • [2] Anderson Peter, 2018, CoRR
  • [3] [Anonymous], 2017, ARXIV170700183
  • [4] [Anonymous], 2018, NeurIPS
  • [5] [Anonymous], 2018, NEURIPS
  • [6] Bazaraa H. S. M., 1993, NONLINEAR PROGRAMMIN
  • [7] Bengio Y., 2009, P 26 ANN INT C MACH, DOI DOI 10.1145/1553374.15533802,5
  • [8] Matterport3D: Learning from RGB-D Data in Indoor Environments
    Chang, Angel
    Dai, Angela
    Funkhouser, Thomas
    Halber, Maciej
    Niessner, Matthias
    Savva, Manolis
    Song, Shuran
    Zeng, Andy
    Zhang, Yinda
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 667 - 676
  • [9] TOUCHDOWN: Natural Language Navigation and Spatial Reasoning in Visual Street Environments
    Chen, Howard
    Suhr, Alane
    Misra, Dipendra
    Snavely, Noah
    Artzi, Yoav
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12530 - 12539
  • [10] Biconvex sets and optimization with biconvex functions: a survey and extensions
    Gorski, Jochen
    Pfeuffer, Frank
    Klamroth, Kathrin
    [J]. MATHEMATICAL METHODS OF OPERATIONS RESEARCH, 2007, 66 (03) : 373 - 407