Echo-Enhanced Embodied Visual Navigation

被引:3
作者
Yu, Yinfeng [1 ,2 ]
Cao, Lele [1 ,3 ]
Sun, Fuchun [1 ]
Yang, Chao [4 ]
Lai, Huicheng [2 ]
Huang, Wenbing [5 ]
机构
[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China
[2] Xinjiang Univ, Coll Informat Sci & Engn, Urumqi 830046, Peoples R China
[3] EQT, Motherbrain, S-11153 Stockholm, Sweden
[4] Shanghai AI Lab, Shanghai 200232, Peoples R China
[5] Tsinghua Univ, Inst AI Ind Res, Beijing 100084, Peoples R China
关键词
Auditory signals - Cooperative markov games - Performance degradation - Poor visibility - Robotic agents - Sensory input - Target location - Traditional approaches - Visual condition - Visual Navigation;
D O I
10.1162/neco_a_01579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Visual navigation involves a movable robotic agent striving to reach a point goal (target location) using vision sensory input. While navigation with ideal visibility has seen plenty of success, it becomes challenging in suboptimal visual conditions like poor illumination, where traditional approaches suffer from severe performance degradation. We propose E3VN (echo-enhanced embodied visual navigation) to effectively perceive the surroundings even under poor visibility to mitigate this problem. This is made possible by adopting an echoer that actively perceives the environment via auditory signals. E3VN models the robot agent as playing a cooperative Markov game with that echoer. The action policies of robot and echoer are jointly optimized to maximize the reward in a two-stream actor-critic architecture. During optimization, the reward is also adaptively decomposed into the robot and echoer parts. Our experiments and ablation studies show that E3VN is consistently effective and robust in point goal navigation tasks, especially under nonideal visibility.
引用
收藏
页码:958 / 976
页数:19
相关论文
共 42 条
  • [1] Anderson P, 2018, Arxiv, DOI [arXiv:1807.06757, 10.48550/ARXIV.1807.06757]
  • [2] Beery S, 2020, PROC CVPR IEEE, P13072, DOI 10.1109/CVPR42600.2020.01309
  • [3] Matterport3D: Learning from RGB-D Data in Indoor Environments
    Chang, Angel
    Dai, Angela
    Funkhouser, Thomas
    Halber, Maciej
    Niessner, Matthias
    Savva, Manolis
    Song, Shuran
    Zeng, Andy
    Zhang, Yinda
    [J]. PROCEEDINGS 2017 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2017, : 667 - 676
  • [4] Chaplot D. S., 2020, P 8 INT C LEARNING R
  • [5] Chen C., 2021, P 9 INT C LEARN REPR
  • [6] Chen C., 2020, P 16 EUR C COMP VIS, P17
  • [7] Semantic Audio-Visual Navigation
    Chen, Changan
    Al-Halah, Ziad
    Grauman, Kristen
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 15511 - 15520
  • [8] Topological Planning with Transformers for Vision-and-Language Navigation
    Chen, Kevin
    Chen, Junshen K.
    Chuang, Jo
    Vazquez, Marynel
    Savarese, Silvio
    [J]. 2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 11271 - 11281
  • [9] Chen LC, 2017, Arxiv, DOI [arXiv:1706.05587, 10.48550/arXiv.1706.05587]
  • [10] Christensen JH, 2020, IEEE INT CONF ROBOT, P1581, DOI [10.1109/icra40945.2020.9196934, 10.1109/ICRA40945.2020.9196934]