Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引:0
|
作者
Chen, Peihao [1 ]
Ji, Dongyu [1 ]
Lin, Kunyang [1 ,2 ]
Zeng, Runhao [5 ]
Li, Thomas H. [6 ]
Tan, Mingkui [1 ,7 ]
Gan, Chuang [3 ,4 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
[3] MIT IBM Watson AI Lab, Cambridge, MA USA
[4] UMass Amherst, Amherst, MA USA
[5] Shenzhen Univ, Shenzhen, Peoples R China
[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China
[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] ENVEDIT: Environment Editing for Vision-and-Language Navigation
    Li, Jialu
    Tan, Hao
    Bansal, Mohit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15386 - 15396
  • [32] Diagnosing the Environment Bias in Vision-and-Language Navigation
    Zhang, Yubo
    Tan, Hao
    Bansal, Mohit
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 890 - 897
  • [33] Topological Planning with Transformers for Vision-and-Language Navigation
    Chen, Kevin
    Chen, Junshen K.
    Chuang, Jo
    Vazquez, Marynel
    Savarese, Silvio
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11271 - 11281
  • [34] A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
    Armitage, Jason
    Impett, Leonardo
    Sennrich, Rico
    2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1094 - 1103
  • [35] AerialVLN (sic) : Vision-and-Language Navigation for UAVs
    Liu, Shubo
    Zhang, Hongsheng
    Qi, Yuankai
    Wang, Peng
    Zhang, Yanning
    Wu, Qi
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15338 - 15348
  • [36] Scaling Data Generation in Vision-and-Language Navigation
    Wang, Zun
    Li, Jialu
    Hong, Yicong
    Wang, Yi
    Wu, Qi
    Bansal, Mohit
    Gould, Stephen
    Tan, Hao
    Qiao, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11975 - 11986
  • [37] A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
    Armitage, Jason
    Impett, Leonardo
    Sennrich, Rico
    Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023, 2023, : 1094 - 1103
  • [38] Multi-Granularity approach for Enhancing the Performance of Network Intrusion Detection with Supervised Learning
    Saraswathy, V. R.
    Kasthuri, N.
    Ramyadevi, I. P.
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
  • [39] MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
    Chen, Jiaqi
    Lin, Bingqian
    Xu, Ran
    Chai, Zhenhua
    Liang, Xiaodan
    Wong, Kwan-Yee K.
    PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9796 - 9810
  • [40] Multi-modal Adapter for Medical Vision-and-Language Learning
    Yu, Zheng
    Qiao, Yanyuan
    Xie, Yutong
    Wu, Qi
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402