Tutor-Guided Interior Navigation With Deep Reinforcement Learning

被引：4

作者：

Zeng, Fanyu ^{[1
]}

Wang, Chen ^{[1
]}

Ge, Shuzhi Sam ^{[2
,3
]}

机构：

[1] Univ Elect Sci & Technol China, Ctr Robot, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

[2] Natl Univ Singapore, Dept Elect Comp Engn, Singapore, Singapore

[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2021年 / 13卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Navigation; Feature extraction; Simultaneous localization and mapping; Task analysis; Reinforcement learning; Robots; Predictive models; Deep reinforcement learning (DRL); navigation; transferability;

D O I：

10.1109/TCDS.2020.3039859

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Traditional reinforcement learning makes policy based on the current system state. However, insufficient system information and few rewards lead to its limited applicability, especially in a partially observed environment with sparse rewards. In this work, we propose a tutor-student network (TSN) for improving an agent's performance with additional auxiliary information. In the tutor-student framework, a tutor module generates auxiliary information, while a student module refers to the tutor's suggestion during training. The key of our proposed approach is that tutor provides prior knowledge that does not correspond to a specified environment to help the student module accelerate the learning procedure. We build 12 indoor mazes in ViZDoom, including empty mazes and mazes with obstacles, evaluate the performance of TSN compared with advantage actor-critic (A2C) and show that the proposed network learned navigation faster and obtained higher accumulated rewards. More importantly, our approach could generalize well to new and unseen domains.

引用

页码：934 / 944

页数：11

共 41 条

[1] Ardiny H., 2015, International Journal of Robotics, Theory and Applications, V4, P10
[2] Bowman Sean L., 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1722, DOI 10.1109/ICRA.2017.7989203
[3] Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age
Cadena, Cesar
Carlone, Luca
Carrillo, Henry
Latif, Yasir
Scaramuzza, Davide
Neira, Jose
Reid, Ian
Leonard, John J.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (06) : 1309 - 1332
[4] Chi MTH, 2001, COGNITIVE SCI, V25, P471, DOI 10.1016/S0364-0213(01)00044-1
[5] LSD-SLAM: Large-Scale Direct Monocular SLAM
Engel, Jakob
Schoeps, Thomas
Cremers, Daniel
[J]. COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 834 - 849
[6] Fried D., 2018, Advances in Neural Information Processing Systems, P3314
[7] Bags of Binary Words for Fast Place Recognition in Image Sequences
Galvez-Lopez, Dorian
Tardos, Juan D.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2012, 28 (05) : 1188 - 1197
[8] Simultaneous Path Planning and Topological Mapping (SP2ATM) for environment exploration and goal oriented navigation
Ge, Shuzhi Sam
Zhang, Qun
Abraham, Aswin Thomas
Rebsamen, Brice
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2011, 59 (3-4) : 228 - 242
[9] Boundary following and globally convergent path planning using instant goals
Ge, SS
Lai, XC
Al Mamun, A
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (02): : 240 - 254
[10] New potential functions for mobile robot path planning
Ge, SS
Cui, YJ
[J]. IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2000, 16 (05): : 615 - 620

← 1 2 3 4 5 →