Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引:0
作者
Chen, Peihao [1 ]
Ji, Dongyu [1 ]
Lin, Kunyang [1 ,2 ]
Zeng, Runhao [5 ]
Li, Thomas H. [6 ]
Tan, Mingkui [1 ,7 ]
Gan, Chuang [3 ,4 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
[3] MIT IBM Watson AI Lab, Cambridge, MA USA
[4] UMass Amherst, Amherst, MA USA
[5] Shenzhen Univ, Shenzhen, Peoples R China
[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China
[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
引用
收藏
页数:13
相关论文
共 50 条
  • [41] Weakly-supervised butterfly detection based on saliency map
    Zhang, Ting
    Waqas, Muhammad
    Fang, Yu
    Liu, Zhaoying
    Halim, Zahid
    Li, Yujian
    Chen, Sheng
    PATTERN RECOGNITION, 2023, 138
  • [42] Multi-modal Adapter for Medical Vision-and-Language Learning
    Yu, Zheng
    Qiao, Yanyuan
    Xie, Yutong
    Wu, Qi
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402
  • [43] Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning
    Wang, Jiawei
    Wang, Teng
    Xu, Lele
    He, Zichen
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6516 - 6528
  • [44] Multi-Granularity Causal Structure Learning
    Liang, Jiaxuan
    Wang, Jun
    Yu, Guoxian
    Xia, Shuyin
    Wang, Guoyin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13727 - 13735
  • [45] Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
    Chen, Shizhe
    Guhur, Pierre-Louis
    Tapaswi, Makarand
    Schmid, Cordelia
    Laptev, Ivan
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 638 - 655
  • [46] A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
    Kamath, Aishwarya
    Anderson, Peter
    Wang, Su
    Koh, Jing Yu
    Ku, Alexander
    Waters, Austin
    Yang, Yinfei
    Baldridge, Jason
    Parekh, Zarana
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10813 - 10823
  • [47] Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
    Hong, Yicong
    Wang, Zun
    Wu, Qi
    Gould, Stephen
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15418 - 15428
  • [48] Weakly-supervised Learning of Schrödinger Equation
    Shiina, Kenta
    Lee, Hwee Kuan
    Okabe, Yutaka
    Mori, Hiroyuki
    JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2024, 93 (06)
  • [49] Weakly-Supervised Reinforcement Learning for Controllable Behavior
    Lee, Lisa
    Eysenbach, Benjamin
    Salakhutdinov, Ruslan
    Gu, Shane
    Finn, Chelsea
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [50] Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning
    Wang, Ting
    Wu, Zongkai
    Wang, Donglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5193 - 5199