Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引:0
|
作者
Chen, Peihao [1 ]
Ji, Dongyu [1 ]
Lin, Kunyang [1 ,2 ]
Zeng, Runhao [5 ]
Li, Thomas H. [6 ]
Tan, Mingkui [1 ,7 ]
Gan, Chuang [3 ,4 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
[3] MIT IBM Watson AI Lab, Cambridge, MA USA
[4] UMass Amherst, Amherst, MA USA
[5] Shenzhen Univ, Shenzhen, Peoples R China
[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China
[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] CrowdMLP: Weakly-supervised crowd counting via multi-granularity MLP
    Wang, Mingjie
    Zhou, Jun
    Cai, Hao
    Gong, Minglun
    PATTERN RECOGNITION, 2023, 144
  • [2] Curriculum Learning for Vision-and-Language Navigation
    Zhang, Jiwen
    Wei, Zhongyu
    Fan, Jianqing
    Peng, Jiajie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [3] Multi-Grounding Navigator for Self-Supervised Vision-and-Language Navigation
    Wu, Zongkai
    Liu, Zihan
    Wang, Donglin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [4] GridMM: Grid Memory Map for Vision-and-Language Navigation
    Wang, Zihan
    Li, Xiangyang
    Yang, Jiahao
    Liu, Yeqi
    Jiang, Shuqiang
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
  • [5] Transferable Representation Learning in Vision-and-Language Navigation
    Huang, Haoshuo
    Jain, Vihan
    Mehta, Harsh
    Ku, Alexander
    Magalhaes, Gabriel
    Baldridge, Jason
    Ie, Eugene
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7403 - 7412
  • [6] Vision-and-Language Navigation via Causal Learning
    Wang, Liuyi
    He, Zongtao
    Dang, Ronghao
    Shen, Mengjiao
    Liu, Chengju
    Chen, Qijun
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
  • [7] Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
    Chen, Tao
    Yao, Yazhou
    Tang, Jinhui
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2960 - 2971
  • [8] Weakly Supervised Vision-and-Language Pre-training with Relative Representations
    Chen, Chi
    Li, Peng
    Sun, Maosong
    Liu, Yang
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8341 - 8355
  • [9] Iterative Vision-and-Language Navigation
    Krantz, Jacob
    Banerjee, Shurjo
    Zhu, Wang
    Corso, Jason
    Anderson, Peter
    Lee, Stefan
    Thomason, Jesse
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
  • [10] Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation
    Tan, Sinan
    Sima, Kuankuan
    Wang, Dunzheng
    Ge, Mengmeng
    Guo, Di
    Liu, Huaping
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14