Learning With Stochastic Guidance for Robot Navigation

被引：30

作者：

Xie, Linhai ^{[1
]}

Miao, Yishu ^{[2
]}

Wang, Sen ^{[3
]}

Blunsom, Phil ^{[1
]}

Wang, Zhihua ^{[1
]}

Chen, Changhao ^{[1
]}

Markham, Andrew ^{[1
]}

Trigoni, Niki ^{[1
]}

机构：

[1] Univ Oxford, Dept Comp Sci, Oxford OX1 3QD, England

[2] MO Intelligence Ltd, Oxford OX2 7HT, England

[3] Heriot Watt Univ, Sch Engn & Phys Sci, Edinburgh EH14 4AS, Midlothian, Scotland

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2021年 / 32卷 / 01期

基金：

英国工程与自然科学研究理事会;

关键词：

Switches; Stochastic processes; Navigation; Robot kinematics; Collision avoidance; Deep deterministic policy gradient (DDPG); deep reinforcement learning (DRL); REINFORCE; robot navigation; REINFORCEMENT; EFFICIENT;

D O I：

10.1109/TNNLS.2020.2977924

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Due to the sparse rewards and high degree of environmental variation, reinforcement learning approaches, such as deep deterministic policy gradient (DDPG), are plagued by issues of high variance when applied in complex real-world environments. We present a new framework for overcoming these issues by incorporating a stochastic switch, allowing an agent to choose between high- and low-variance policies. The stochastic switch can be jointly trained with the original DDPG in the same framework. In this article, we demonstrate the power of the framework in a navigation task, where the robot can dynamically choose to learn through exploration or to use the output of a heuristic controller as guidance. Instead of starting from completely random actions, the navigation capability of a robot can be quickly bootstrapped by several simple independent controllers. The experimental results show that with the aid of stochastic guidance, we are able to effectively and efficiently train DDPG navigation policies and achieve significantly better performance than state-of-the-art baseline models.

引用

页码：166 / 176

页数：11

共 40 条

[1]

[Anonymous], 2008, P ICRA WORKSH PATH P

[2]

[Anonymous], 2017, ADV NEURAL INFORM PR

[3]

[Anonymous], 2017, ARXIV170609829

[4]

[Anonymous], 2014, INT C MACH LEARN

[5] The dynamic window approach to collision avoidance [J].

Fox, D ;

Burgard, W ;

Thrun, S .

IEEE ROBOTICS & AUTOMATION MAGAZINE, 1997, 4 (01) :23-33

[6] A Survey of Actor-Critic Reinforcement Learning: Standard and Natural Policy Gradients [J].

Grondman, Ivo ;

Busoniu, Lucian ;

Lopes, Gabriel A. D. ;

Babuska, Robert .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2012, 42 (06) :1291-1307

[7]

Gu S., 2017, P INT C LEARN REPR

[8]

Houthooft Rein, 2016, Advances in neural information processing systems, V29

[9]

Jang E., 2017, P INT C LEARN REPR

[10]

Johnson R, 2013, P ADV NEUR INF PROC, P315

← 1 2 3 4 →