Agent-Centric Relation Graph for Object Visual Navigation

被引：7

作者：

Hu, Xiaobo ^{[1
]}

Lin, Youfang ^{[1
]}

Wang, Shuo ^{[1
]}

Wu, Zhihao ^{[1
]}

Lv, Kai ^{[1
]}

机构：

[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing Key Lab Traff Data Anal & Min, Beijing 100044, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2024年 / 34卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Navigation; Visualization; Visual perception; Task analysis; Transformers; Semantics; Sonar navigation; Object visual navigation; relation graph; depth estimation; reinforcement learning; REINFORCEMENT; SLAM;

D O I：

10.1109/TCSVT.2023.3291131

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Object visual navigation aims to steer an agent toward a target object based on visual observations. It is highly desirable to reasonably perceive the environment and accurately control the agent. In the navigation task, we introduce an Agent-Centric Relation Graph (ACRG) for learning the visual representation based on the relationships in the environment. ACRG is a highly effective structure that consists of two relationships, i.e., the horizontal relationship among objects and the distance relationship between the agent and objects. On the one hand, we design the Object Horizontal Relationship Graph (OHRG) that stores the relative horizontal location among objects. On the other hand, we propose the Agent-Target Distance Relationship Graph (ATDRG) that enables the agent to perceive the distance between the target and objects. For ATDRG, we utilize image depth to obtain the target distance and imply the vertical location to capture the distance relationship among objects in the vertical direction. With the above graphs, the agent can perceive the environment and output navigation actions. Experimental results in the artificial environment AI2-THOR demonstrate that ACRG significantly outperforms other state-of-the-art methods in unseen testing environments.

引用

页码：1295 / 1309

页数：15

共 55 条

[11] Probabilistic appearance based navigation and loop closing [J].

Cummins, Mark ;

Newman, Paul .

PROCEEDINGS OF THE 2007 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-10, 2007, :2042-+

[12] RoboTHOR: An Open Simulation-to-Real Embodied AI Platform [J].

Deitke, Matt ;

Han, Winson ;

Herrasti, Alvaro ;

Kembhavi, Aniruddha ;

Kolve, Eric ;

Mottaghi, Roozbeh ;

Salvador, Jordi ;

Schwenk, Dustin ;

VanderBilt, Eli ;

Wallingford, Matthew ;

Weihs, Luca ;

Yatskar, Mark ;

Farhadi, Ali .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3161-3171

[13]

Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848

[14] A solution to the simultaneous localization and map building (SLAM) problem [J].

Dissanayake, MWMG ;

Newman, P ;

Clark, S ;

Durrant-Whyte, HF ;

Csorba, M .

IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2001, 17 (03) :229-241

[15]

Du H, 2022, P INT C ROB AUT ICRA, P1

[16] Scene Memory Transformer for Embodied Agents in Long-Horizon Tasks [J].

Fang, Kuan ;

Toshev, Alexander ;

Li Fei-Fei ;

Savarese, Silvio .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :538-547

[17] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

[18]

Heming Du, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12352), P19, DOI 10.1007/978-3-030-58571-6_2

[19]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

[20]

Hu RH, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P6551

← 1 2 3 4 5 6 →