Learning With Stochastic Guidance for Robot Navigation

被引:29
作者
Xie, Linhai [1 ]
Miao, Yishu [2 ]
Wang, Sen [3 ]
Blunsom, Phil [1 ]
Wang, Zhihua [1 ]
Chen, Changhao [1 ]
Markham, Andrew [1 ]
Trigoni, Niki [1 ]
机构
[1] Univ Oxford, Dept Comp Sci, Oxford OX1 3QD, England
[2] MO Intelligence Ltd, Oxford OX2 7HT, England
[3] Heriot Watt Univ, Sch Engn & Phys Sci, Edinburgh EH14 4AS, Midlothian, Scotland
基金
英国工程与自然科学研究理事会;
关键词
Switches; Stochastic processes; Navigation; Robot kinematics; Collision avoidance; Deep deterministic policy gradient (DDPG); deep reinforcement learning (DRL); REINFORCE; robot navigation; REINFORCEMENT; EFFICIENT;
D O I
10.1109/TNNLS.2020.2977924
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the sparse rewards and high degree of environmental variation, reinforcement learning approaches, such as deep deterministic policy gradient (DDPG), are plagued by issues of high variance when applied in complex real-world environments. We present a new framework for overcoming these issues by incorporating a stochastic switch, allowing an agent to choose between high- and low-variance policies. The stochastic switch can be jointly trained with the original DDPG in the same framework. In this article, we demonstrate the power of the framework in a navigation task, where the robot can dynamically choose to learn through exploration or to use the output of a heuristic controller as guidance. Instead of starting from completely random actions, the navigation capability of a robot can be quickly bootstrapped by several simple independent controllers. The experimental results show that with the aid of stochastic guidance, we are able to effectively and efficiently train DDPG navigation policies and achieve significantly better performance than state-of-the-art baseline models.
引用
收藏
页码:166 / 176
页数:11
相关论文
共 40 条
[1]  
[Anonymous], 2017, ADV NEURAL INFORM PR
[2]  
[Anonymous], 2017, P INT C LEARN REPR
[3]  
[Anonymous], 1995, J. Int. Comput. Games Assoc.
[4]  
[Anonymous], 2012, P MACHINE LEARNING R
[5]  
[Anonymous], 2014, INT C MACH LEARN
[6]  
[Anonymous], 1995, PID CONTROLLERS THEO
[7]  
[Anonymous], 2015, P INT C LEARN REPR I
[8]   The dynamic window approach to collision avoidance [J].
Fox, D ;
Burgard, W ;
Thrun, S .
IEEE ROBOTICS & AUTOMATION MAGAZINE, 1997, 4 (01) :23-33
[9]   A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J].
Grondman, Ivo ;
Busoniu, Lucian ;
Lopes, Gabriel A. D. ;
Babuska, Robert .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1291-1307
[10]  
Houthooft Rein, 2016, ADV NEURAL INFORM PR, V29