NavTr: Object-Goal Navigation With Learnable Transformer Queries

被引：0

作者：

Mao, Qiuyu ^{[1
]}

Wang, Jikai ^{[1
]}

Xu, Meng ^{[1
]}

Chen, Zonghai ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Navigation; Transformers; Semantics; Visualization; Vectors; Three-dimensional displays; Long short term memory; Encoding; Computer architecture; Aggregates; Vision-based navigation; representation learning; reinforcement learning;

D O I：

10.1109/LRA.2024.3497718

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This letter introduces Navigation Transformer (NavTr), a novel framework for object-goal navigation using Transformer queries to enhance the learning and representation of environment states. By integrating semantic information, object positions, and neighborhood information, NavTr creates a unified, comprehensive, and extensible state representation for the object-goal navigating task. In the framework, the Transformer queries implicitly learn inter-object relationships, which facilitates high-level understanding of the environment. Additionally, NavTr implements target-oriented supervisory signals, such as rotation rewards and spatial loss, which improve exploration efficiency in the reinforcement learning framework. NavTr outperforms popular graph-based and Attention-based methods by a large margin in terms of success rate (SR) and success weighted by path length (SPL). Extensive experiments on the AI2-THOR dataset demonstrate the effectiveness of our approach.

引用

页码：11738 / 11745

页数：8

共 27 条

[1] Anderson P, 2018, Arxiv, DOI [arXiv:1807.06757, DOI 10.48550/ARXIV.1807.06757]
[2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3] Chaplot DS, 2020, ADV NEUR IN, V33
[4] Deitke M, 2022, Arxiv, DOI arXiv:2210.06849
[5] Visual Object Search by Learning Spatial Context
Druon, Raphael
Yoshiyasu, Yusuke
Kanezaki, Asako
Watt, Alassane
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 1279 - 1286
[6] Du H., 2021, P INT C LEARN REPR, P12
[7] Object Memory Transformer for Object Goal Navigation
Fukushima, Rui
Ota, Kei
Kanezaki, Asako
Sasaki, Yoko
Yoshiyasu, Yusuke
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 11288 - 11294
[8] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[9] Johnson J, 2015, PROC CVPR IEEE, P3668, DOI 10.1109/CVPR.2015.7298990
[10] Kim Nuri, 2023, P C ROBOT LEARNING C, P393

← 1 2 3 →