Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引:0
|
作者
Chen, Peihao [1 ]
Ji, Dongyu [1 ]
Lin, Kunyang [1 ,2 ]
Zeng, Runhao [5 ]
Li, Thomas H. [6 ]
Tan, Mingkui [1 ,7 ]
Gan, Chuang [3 ,4 ]
机构
[1] South China Univ Technol, Guangzhou, Peoples R China
[2] Pazhou Lab, Guangzhou, Peoples R China
[3] MIT IBM Watson AI Lab, Cambridge, MA USA
[4] UMass Amherst, Amherst, MA USA
[5] Shenzhen Univ, Shenzhen, Peoples R China
[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China
[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] DAP: DOMAIN-AWARE PROMPT LEARNING FOR VISION-AND-LANGUAGE NAVIGATION
    Liu, Ting
    Hu, Yue
    Wu, Wansen
    Wang, Youkai
    Xu, Kai
    Yin, Quanjun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2615 - 2619
  • [22] Weakly-supervised learning of visual relations
    Peyre, Julia
    Laptev, Ivan
    Schmid, Cordelia
    Sivic, Josef
    2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 5189 - 5198
  • [23] Masked Path Modeling for Vision-and-Language Navigation
    Dou, Zi-Yi
    Gao, Feng
    Peng, Nanyun
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15255 - 15269
  • [24] Multi-granularity Navigation for Self Service Moving
    Zhang, Ge
    Chen, Haosheng
    Ye, Yangdong
    2017 INTERNATIONAL CONFERENCE ON VIRTUAL REALITY AND VISUALIZATION (ICVRV 2017), 2017, : 296 - 297
  • [25] Local Slot Attention for Vision-and-Language Navigation
    Zhuang, Yifeng
    Sun, Qiang
    Fu, Yanwei
    Chen, Lifeng
    Xue, Xiangyang
    PROCEEDINGS OF THE 2022 INTERNATIONAL CONFERENCE ON MULTIMEDIA RETRIEVAL, ICMR 2022, 2022, : 545 - 553
  • [26] Improved Speaker and Navigator for Vision-and-Language Navigation
    Wu, Zongkai
    Liu, Zihan
    Wang, Ting
    Wang, Donglin
    IEEE MULTIMEDIA, 2021, 28 (04) : 55 - 63
  • [27] Behavioral Analysis of Vision-and-Language Navigation Agents
    Yang, Zijiao
    Majumdar, Arjun
    Lee, Stefan
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 2574 - 2582
  • [28] Memory-Adaptive Vision-and-Language Navigation
    He, Keji
    Jing, Ya
    Huang, Yan
    Lu, Zhihe
    An, Dong
    Wang, Liang
    PATTERN RECOGNITION, 2024, 153
  • [29] Vital information matching in vision-and-language navigation
    Jia, Zixi
    Yu, Kai
    Ru, Jingyu
    Yang, Sikai
    Coleman, Sonya
    FRONTIERS IN NEUROROBOTICS, 2022, 16
  • [30] ENVEDIT: Environment Editing for Vision-and-Language Navigation
    Li, Jialu
    Tan, Hao
    Bansal, Mohit
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15386 - 15396