Visual Navigation Subject to Embodied Mismatch

被引：1

作者：

Liu, Xinzhu ^{[1
,2
]}

Guo, Di ^{[3
]}

Liu, Huaping ^{[1
,2
]}

Zhang, Xinyu ^{[4
]}

Sun, Fuchun ^{[1
,2
]}

机构：

[1] Tsinghua Univ, Dept Comp Sci & Technol, Beijing 100084, Peoples R China

[2] Tsinghua Univ, Beijing Natl Res Ctr Informat Sci & Technol, Beijing 100084, Peoples R China

[3] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

[4] Tsinghua Univ, Sch Vehicle & Mobil, Beijing 100084, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2023年 / 15卷 / 04期

关键词：

Robots; Different action spaces; embodied visual navigation; robust adversary learning;

D O I：

10.1109/TCDS.2023.3238840

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the embodied visual navigation task, the agent navigates to a target location based on the visual observation it collects during the interaction with the environment. And, various approaches have been proposed to learn robust navigation strategies for this task. However, existing approaches assume that the action spaces in the training and testing phases are the same, which is usually not the case in reality. And, thus, it is difficult to directly apply these approaches on practical scenarios. In this article, we consider the situation where the action spaces in the training and testing phases are different, and a novel task of visual navigation subject to embodied mismatch is proposed. To solve the proposed task, we establish a two-stage robust adversary learning framework which can learn a robust policy to adapt the learned model to a new action space. In the first stage, an adversary training mechanism is used to learn a robust feature representation of the state. In the second stage, an adaptation training is used to transfer the learned strategy to a new action space with fewer training samples. Experiments of three types of embodied visual navigation tasks are conducted in 3-D indoor scenes demonstrating the effectiveness of the proposed approach.

引用

页码：1959 / 1970

页数：12

共 53 条

[1] Abdullah MA, 2019, Arxiv, DOI arXiv:1907.13196
[2] Anderson P, 2018, Arxiv, DOI [arXiv:1807.06757, 10.48550/ARXIV.1807.06757]
[3] Chaplot DS, 2020, ADV NEUR IN, V33
[4] Chen C., 2020, P ECCV, P844
[5] Christiano P, 2016, Arxiv, DOI [arXiv:1610.03518, 10.48550/arxiv.1610.03518]
[6] Embodied Question Answering
Das, Abhishek
Datta, Samyak
Gkioxari, Georgia
Lee, Stefan
Parikh, Devi
Batra, Dhruv
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1 - 10
[7] Deitke M, 2022, Arxiv, DOI arXiv:2206.06994
[8] Visual Object Search by Learning Spatial Context
Druon, Raphael
Yoshiyasu, Yusuke
Kanezaki, Asako
Watt, Alassane
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 1279 - 1286
[9] Gan C, 2020, IEEE INT CONF ROBOT, P9701, DOI [10.1109/icra40945.2020.9197008, 10.1109/ICRA40945.2020.9197008]
[10] LSTM: A Search Space Odyssey
Greff, Klaus
Srivastava, Rupesh K.
Koutnik, Jan
Steunebrink, Bas R.
Schmidhuber, Juergen
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (10) : 2222 - 2232

← 1 2 3 4 5 6 →