Echo-Enhanced Embodied Visual Navigation

被引：3

作者：

Yu, Yinfeng ^{[1
,2
]}

Cao, Lele ^{[1
,3
]}

Sun, Fuchun ^{[1
]}

Yang, Chao ^{[4
]}

Lai, Huicheng ^{[2
]}

Huang, Wenbing ^{[5
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China

[3] EQT, Motherbrain, S-11153 Stockholm, Sweden

[4] Shanghai AI Lab, Shanghai 200232, Peoples R China

[5] Tsinghua Univ, Inst AI Ind Res, Beijing 100084, Peoples R China

来源：

NEURAL COMPUTATION | 2023年 / 35卷 / 05期

关键词：

Auditory signals - Cooperative markov games - Performance degradation - Poor visibility - Robotic agents - Sensory input - Target location - Traditional approaches - Visual condition - Visual Navigation;

D O I：

10.1162/neco_a_01579

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Visual navigation involves a movable robotic agent striving to reach a point goal (target location) using vision sensory input. While navigation with ideal visibility has seen plenty of success, it becomes challenging in suboptimal visual conditions like poor illumination, where traditional approaches suffer from severe performance degradation. We propose E3VN (echo-enhanced embodied visual navigation) to effectively perceive the surroundings even under poor visibility to mitigate this problem. This is made possible by adopting an echoer that actively perceives the environment via auditory signals. E3VN models the robot agent as playing a cooperative Markov game with that echoer. The action policies of robot and echoer are jointly optimized to maximize the reward in a two-stream actor-critic architecture. During optimization, the reward is also adaptively decomposed into the robot and echoer parts. Our experiments and ablation studies show that E3VN is consistently effective and robust in point goal navigation tasks, especially under nonideal visibility.

引用

页码：958 / 976

页数：19

共 42 条

[1] Anderson P, 2018, Arxiv, DOI [arXiv:1807.06757, 10.48550/ARXIV.1807.06757]
[2] Beery S, 2020, PROC CVPR IEEE, P13072, DOI 10.1109/CVPR42600.2020.01309
[3] Matterport3D: Learning from RGB-D Data in Indoor Environments
Chang, Angel
Dai, Angela
Funkhouser, Thomas
Halber, Maciej
Niessner, Matthias
Savva, Manolis
Song, Shuran
Zeng, Andy
Zhang, Yinda
[J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 667 - 676
[4] Chaplot D. S., 2020, P 8 INT C LEARNING R
[5] Chen C., 2021, P 9 INT C LEARN REPR
[6] Chen C., 2020, P 16 EUR C COMP VIS, P17
[7] Semantic Audio-Visual Navigation
Chen, Changan
Al-Halah, Ziad
Grauman, Kristen
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
[8] Topological Planning with Transformers for Vision-and-Language Navigation
Chen, Kevin
Chen, Junshen K.
Chuang, Jo
Vazquez, Marynel
Savarese, Silvio
[J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11271 - 11281
[9] Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
[10] Christensen JH, 2020, IEEE INT CONF ROBOT, P1581, DOI [10.1109/icra40945.2020.9196934, 10.1109/ICRA40945.2020.9196934]

← 1 2 3 4 5 →