Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引：0

作者：

Chen, Peihao ^{[1
]}

Ji, Dongyu ^{[1
]}

Lin, Kunyang ^{[1
,2
]}

Zeng, Runhao ^{[5
]}

Li, Thomas H. ^{[6
]}

Tan, Mingkui ^{[1
,7
]}

Gan, Chuang ^{[3
,4
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

[3] MIT IBM Watson AI Lab, Cambridge, MA USA

[4] UMass Amherst, Amherst, MA USA

[5] Shenzhen Univ, Shenzhen, Peoples R China

[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China

[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.

引用

页数：13

共 50 条

[1] CrowdMLP: Weakly-supervised crowd counting via multi-granularity MLP
Wang, Mingjie
Zhou, Jun
Cai, Hao
Gong, Minglun
PATTERN RECOGNITION, 2023, 144
[2] Curriculum Learning for Vision-and-Language Navigation
Zhang, Jiwen
Wei, Zhongyu
Fan, Jianqing
Peng, Jiajie
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[3] Multi-Grounding Navigator for Self-Supervised Vision-and-Language Navigation
Wu, Zongkai
Liu, Zihan
Wang, Donglin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[4] GridMM: Grid Memory Map for Vision-and-Language Navigation
Wang, Zihan
Li, Xiangyang
Yang, Jiahao
Liu, Yeqi
Jiang, Shuqiang
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15579 - 15590
[5] Transferable Representation Learning in Vision-and-Language Navigation
Huang, Haoshuo
Jain, Vihan
Mehta, Harsh
Ku, Alexander
Magalhaes, Gabriel
Baldridge, Jason
Ie, Eugene
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7403 - 7412
[6] Vision-and-Language Navigation via Causal Learning
Wang, Liuyi
He, Zongtao
Dang, Ronghao
Shen, Mengjiao
Liu, Chengju
Chen, Qijun
2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 13139 - 13150
[7] Multi-Granularity Denoising and Bidirectional Alignment for Weakly Supervised Semantic Segmentation
Chen, Tao
Yao, Yazhou
Tang, Jinhui
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2023, 32 : 2960 - 2971
[8] Weakly Supervised Vision-and-Language Pre-training with Relative Representations
Chen, Chi
Li, Peng
Sun, Maosong
Liu, Yang
PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 8341 - 8355
[9] Iterative Vision-and-Language Navigation
Krantz, Jacob
Banerjee, Shurjo
Zhu, Wang
Corso, Jason
Anderson, Peter
Lee, Stefan
Thomason, Jesse
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 14921 - 14930
[10] Self-Supervised 3-D Semantic Representation Learning for Vision-and-Language Navigation
Tan, Sinan
Sima, Kuankuan
Wang, Dunzheng
Ge, Mengmeng
Guo, Di
Liu, Huaping
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 14

← 1 2 3 4 5 →