Tutor-Guided Interior Navigation With Deep Reinforcement Learning

被引:4
作者
Zeng, Fanyu [1 ]
Wang, Chen [1 ]
Ge, Shuzhi Sam [2 ,3 ]
机构
[1] Univ Elect Sci & Technol China, Ctr Robot, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
[2] Natl Univ Singapore, Dept Elect Comp Engn, Singapore, Singapore
[3] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu 611731, Peoples R China
基金
中国国家自然科学基金;
关键词
Navigation; Feature extraction; Simultaneous localization and mapping; Task analysis; Reinforcement learning; Robots; Predictive models; Deep reinforcement learning (DRL); navigation; transferability;
D O I
10.1109/TCDS.2020.3039859
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Traditional reinforcement learning makes policy based on the current system state. However, insufficient system information and few rewards lead to its limited applicability, especially in a partially observed environment with sparse rewards. In this work, we propose a tutor-student network (TSN) for improving an agent's performance with additional auxiliary information. In the tutor-student framework, a tutor module generates auxiliary information, while a student module refers to the tutor's suggestion during training. The key of our proposed approach is that tutor provides prior knowledge that does not correspond to a specified environment to help the student module accelerate the learning procedure. We build 12 indoor mazes in ViZDoom, including empty mazes and mazes with obstacles, evaluate the performance of TSN compared with advantage actor-critic (A2C) and show that the proposed network learned navigation faster and obtained higher accumulated rewards. More importantly, our approach could generalize well to new and unseen domains.
引用
收藏
页码:934 / 944
页数:11
相关论文
共 41 条
  • [1] Ardiny H., 2015, International Journal of Robotics, Theory and Applications, V4, P10
  • [2] Bowman Sean L., 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P1722, DOI 10.1109/ICRA.2017.7989203
  • [3] Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age
    Cadena, Cesar
    Carlone, Luca
    Carrillo, Henry
    Latif, Yasir
    Scaramuzza, Davide
    Neira, Jose
    Reid, Ian
    Leonard, John J.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2016, 32 (06) : 1309 - 1332
  • [4] Chi MTH, 2001, COGNITIVE SCI, V25, P471, DOI 10.1016/S0364-0213(01)00044-1
  • [5] LSD-SLAM: Large-Scale Direct Monocular SLAM
    Engel, Jakob
    Schoeps, Thomas
    Cremers, Daniel
    [J]. COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 834 - 849
  • [6] Fried D., 2018, Advances in Neural Information Processing Systems, P3314
  • [7] Bags of Binary Words for Fast Place Recognition in Image Sequences
    Galvez-Lopez, Dorian
    Tardos, Juan D.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2012, 28 (05) : 1188 - 1197
  • [8] Simultaneous Path Planning and Topological Mapping (SP2ATM) for environment exploration and goal oriented navigation
    Ge, Shuzhi Sam
    Zhang, Qun
    Abraham, Aswin Thomas
    Rebsamen, Brice
    [J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2011, 59 (3-4) : 228 - 242
  • [9] Boundary following and globally convergent path planning using instant goals
    Ge, SS
    Lai, XC
    Al Mamun, A
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2005, 35 (02): : 240 - 254
  • [10] New potential functions for mobile robot path planning
    Ge, SS
    Cui, YJ
    [J]. IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 2000, 16 (05): : 615 - 620