Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引：0

作者：

Chen, Peihao ^{[1
]}

Ji, Dongyu ^{[1
]}

Lin, Kunyang ^{[1
,2
]}

Zeng, Runhao ^{[5
]}

Li, Thomas H. ^{[6
]}

Tan, Mingkui ^{[1
,7
]}

Gan, Chuang ^{[3
,4
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

[3] MIT IBM Watson AI Lab, Cambridge, MA USA

[4] UMass Amherst, Amherst, MA USA

[5] Shenzhen Univ, Shenzhen, Peoples R China

[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China

[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.

引用

页数：13

共 50 条

[41] Weakly-supervised butterfly detection based on saliency map
Zhang, Ting
Waqas, Muhammad
Fang, Yu
Liu, Zhaoying
Halim, Zahid
Li, Yujian
Chen, Sheng
PATTERN RECOGNITION, 2023, 138
[42] Multi-modal Adapter for Medical Vision-and-Language Learning
Yu, Zheng
Qiao, Yanyuan
Xie, Yutong
Wu, Qi
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402
[43] Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning
Wang, Jiawei
Wang, Teng
Xu, Lele
He, Zichen
Sun, Changyin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6516 - 6528
[44] Multi-Granularity Causal Structure Learning
Liang, Jiaxuan
Wang, Jun
Yu, Guoxian
Xia, Shuyin
Wang, Guoyin
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13727 - 13735
[45] Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
Chen, Shizhe
Guhur, Pierre-Louis
Tapaswi, Makarand
Schmid, Cordelia
Laptev, Ivan
COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 638 - 655
[46] A New Path: Scaling Vision-and-Language Navigation with Synthetic Instructions and Imitation Learning
Kamath, Aishwarya
Anderson, Peter
Wang, Su
Koh, Jing Yu
Ku, Alexander
Waters, Austin
Yang, Yinfei
Baldridge, Jason
Parekh, Zarana
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 10813 - 10823
[47] Bridging the Gap Between Learning in Discrete and Continuous Environments for Vision-and-Language Navigation
Hong, Yicong
Wang, Zun
Wu, Qi
Gould, Stephen
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15418 - 15428
[48] Weakly-supervised Learning of Schrödinger Equation
Shiina, Kenta
Lee, Hwee Kuan
Okabe, Yutaka
Mori, Hiroyuki
JOURNAL OF THE PHYSICAL SOCIETY OF JAPAN, 2024, 93 (06)
[49] Weakly-Supervised Reinforcement Learning for Controllable Behavior
Lee, Lisa
Eysenbach, Benjamin
Salakhutdinov, Ruslan
Gu, Shane
Finn, Chelsea
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[50] Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning
Wang, Ting
Wu, Zongkai
Wang, Donglin
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5193 - 5199

← 1 2 3 4 5 →