Agent-Centric Relation Graph for Object Visual Navigation

被引：7

作者：

Hu, Xiaobo ^{[1
]}

Lin, Youfang ^{[1
]}

Wang, Shuo ^{[1
]}

Wu, Zhihao ^{[1
]}

Lv, Kai ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Navigation; Visualization; Visual perception; Task analysis; Transformers; Semantics; Sonar navigation; Object visual navigation; relation graph; depth estimation; reinforcement learning; REINFORCEMENT; SLAM;

D O I：

10.1109/TCSVT.2023.3291131

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Object visual navigation aims to steer an agent toward a target object based on visual observations. It is highly desirable to reasonably perceive the environment and accurately control the agent. In the navigation task, we introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment. ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects. On the one hand, we design the Object Horizontal Relationship Graph (OHRG) that stores the relative horizontal location among objects. On the other hand, we propose the Agent-Target Distance Relationship Graph (ATDRG) that enables the agent to perceive the distance between the target and objects. For ATDRG, we utilize image depth to obtain the target distance and imply the vertical location to capture the distance relationship among objects in the vertical direction. With the above graphs, the agent can perceive the environment and output navigation actions. Experimental results in the artificial environment AI2-THOR demonstrate that ACRG significantly outperforms other state-of-the-art methods in unseen testing environments.

引用

页码：1295 / 1309

页数：15

共 55 条

[1] Anderson P, 2018, Arxiv, DOI [arXiv:1807.06757, DOI 10.48550/ARXIV.1807.06757]
[2] Angeli A, 2009, IEEE INT CONF ROBOT, P2029
[3] [Anonymous], 2019, ICLR
[4] Babaeizadeh Mohammad, 2017, P INT C LEARN REPR, P1
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] Cartillier V, 2021, AAAI CONF ARTIF INTE, V35, P964
[7] Chaplot Devendra Singh, 2020, INT C LEARN REPR ICL
[8] Trans4Map: Revisiting Holistic Bird's-Eye-View Mapping from Egocentric Images to Allocentric Semantics with Vision Transformers
Chen, Chang
Zhang, Jiaming
Yang, Kailun
Peng, Kunyu
Stiefelhagen, Rainer
[J]. 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), 2023, : 4002 - 4011
[9] Chen K, 2019, ROBOTICS: SCIENCE AND SYSTEMS XV
[10] Chen XT, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P694

← 1 2 3 4 5 6 →