Weakly-Supervised Multi-Granularity Map Learning for Vision-and-Language Navigation

被引：0

作者：

Chen, Peihao ^{[1
]}

Ji, Dongyu ^{[1
]}

Lin, Kunyang ^{[1
,2
]}

Zeng, Runhao ^{[5
]}

Li, Thomas H. ^{[6
]}

Tan, Mingkui ^{[1
,7
]}

Gan, Chuang ^{[3
,4
]}

机构：

[1] South China Univ Technol, Guangzhou, Peoples R China

[2] Pazhou Lab, Guangzhou, Peoples R China

[3] MIT IBM Watson AI Lab, Cambridge, MA USA

[4] UMass Amherst, Amherst, MA USA

[5] Shenzhen Univ, Shenzhen, Peoples R China

[6] Peking Univ, Informat Technol R&D Innovat Ctr, Beijing, Peoples R China

[7] Minist Educ, Key Lab Big Data & Intelligent Robot, Beijing, Peoples R China

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022) | 2022年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address a practical yet challenging problem of training robot agents to navigate in an environment following a path described by some language instructions. The instructions often contain descriptions of objects in the environment. To achieve accurate and efficient navigation, it is critical to build a map that accurately represents both spatial location and the semantic information of the environment objects. However, enabling a robot to build a map that well represents the environment is extremely challenging as the environment often involves diverse objects with various attributes. In this paper, we propose a multi-granularity map, which contains both object fine-grained details (e.g., color, texture) and semantic classes, to represent objects more comprehensively. Moreover, we propose a weakly-supervised auxiliary task, which requires the agent to localize instruction-relevant objects on the map. Through this task, the agent not only learns to localize the instruction-relevant objects for navigation but also is encouraged to learn a better map representation that reveals object information. We then feed the learned map and instruction to a waypoint predictor to determine the next navigation goal. Experimental results show our method outperforms the state-of-the-art by 4.0% and 4.6% w.r.t. success rate both in seen and unseen environments, respectively on VLN-CE dataset. Code is available at https://github.com/PeihaoChen/WS-MGMap.

引用

页数：13

共 50 条

[31] ENVEDIT: Environment Editing for Vision-and-Language Navigation
Li, Jialu
Tan, Hao
Bansal, Mohit
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 15386 - 15396
[32] Diagnosing the Environment Bias in Vision-and-Language Navigation
Zhang, Yubo
Tan, Hao
Bansal, Mohit
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 890 - 897
[33] Topological Planning with Transformers for Vision-and-Language Navigation
Chen, Kevin
Chen, Junshen K.
Chuang, Jo
Vazquez, Marynel
Savarese, Silvio
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11271 - 11281
[34] A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
Armitage, Jason
Impett, Leonardo
Sennrich, Rico
2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 1094 - 1103
[35] AerialVLN (sic) : Vision-and-Language Navigation for UAVs
Liu, Shubo
Zhang, Hongsheng
Qi, Yuankai
Wang, Peng
Zhang, Yanning
Wu, Qi
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 15338 - 15348
[36] Scaling Data Generation in Vision-and-Language Navigation
Wang, Zun
Li, Jialu
Hong, Yicong
Wang, Yi
Wu, Qi
Bansal, Mohit
Gould, Stephen
Tan, Hao
Qiao, Yu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 11975 - 11986
[37] A Priority Map for Vision-and-Language Navigation with Trajectory Plans and Feature-Location Cues
Armitage, Jason
Impett, Leonardo
Sennrich, Rico
Proceedings - 2023 IEEE Winter Conference on Applications of Computer Vision, WACV 2023, 2023, : 1094 - 1103
[38] Multi-Granularity approach for Enhancing the Performance of Network Intrusion Detection with Supervised Learning
Saraswathy, V. R.
Kasthuri, N.
Ramyadevi, I. P.
PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON INTELLIGENT SYSTEMS AND CONTROL (ISCO'16), 2016,
[39] MapGPT: Map-Guided Prompting with Adaptive Path Planning for Vision-and-Language Navigation
Chen, Jiaqi
Lin, Bingqian
Xu, Ran
Chai, Zhenhua
Liang, Xiaodan
Wong, Kwan-Yee K.
PROCEEDINGS OF THE 62ND ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1: LONG PAPERS, 2024, : 9796 - 9810
[40] Multi-modal Adapter for Medical Vision-and-Language Learning
Yu, Zheng
Qiao, Yanyuan
Xie, Yutong
Wu, Qi
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 393 - 402

← 1 2 3 4 5 →