An Online Training Method for Augmenting MPC with Deep Reinforcement Learning

被引：15

作者：

Bellegarda, Guillaume ^{[1
]}

Byl, Katie ^{[1
]}

机构：

[1] Univ Calif Santa Barbara UCSB, Robot Lab, Dept Elect & Comp Engn, Santa Barbara, CA 93106 USA

来源：

2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS) | 2020年

关键词：

MODEL;

D O I：

10.1109/IROS45743.2020.9341021

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Recent breakthroughs both in reinforcement learning and trajectory optimization have made significant advances towards real world robotic system deployment. Reinforcement learning (RL) can be applied to many problems without needing any modeling or intuition about the system, at the cost of high sample complexity and the inability to prove any metrics about the learned policies. Trajectory optimization (TO) on the other hand allows for stability and robustness analyses on generated motions and trajectories, but is only as good as the often over-simplified derived model, and may have prohibitively expensive computation times for real-time control, for example in contact rich environments. This paper seeks to combine the benefits from these two areas while mitigating their drawbacks by (1) decreasing RL sample complexity by using existing knowledge of the problem with real-time optimal control, and (2) allowing online policy deployment at any point in the training process by using the TO (MPC) as a baseline or worst-case scenario action, while continuously improving the combined learned-optimized policy with deep RL. This method is evaluated on tasks of successively navigating a car model to a series of goal destinations over slippery terrains as fast as possible, in which drifting will allow the system to more quickly change directions while maintaining high speeds.

引用

页码：5453 / 5459

页数：7

共 31 条

[1] CasADi: a software framework for nonlinear optimization and optimal control [J].

Andersson, Joel A. E. ;

Gillis, Joris ;

Horn, Greg ;

Rawlings, James B. ;

Diehl, Moritz .

MATHEMATICAL PROGRAMMING COMPUTATION, 2019, 11 (01) :1-36

[2]

Bellegarda G, 2020, IEEE INT CONF ROBOT, P7905, DOI [10.1109/ICRA40945.2020.9197541, 10.1109/icra40945.2020.9197541]

[3]

Bellegarda G, 2019, IEEE DECIS CONTR P, P7776, DOI 10.1109/CDC40024.2019.9029417

[4]

Brockman Greg, 2016, arXiv

[5]

Chou PW, 2017, PR MACH LEARN RES, V70

[6]

Coumans E., 2016, Pybullet, a python module for physics simulation for games, robotics and machine learning

[7]

Dhariwal, 2017, OPENAI BASELINES

[8] Optimization-based Full Body Control for the DARPA Robotics Challenge [J].

Feng, Siyuan ;

Whitman, Eric ;

Xinjilefu, X. ;

Atkeson, Christopher G. .

JOURNAL OF FIELD ROBOTICS, 2015, 32 (02) :293-312

[9]

Haarnoja T., 2019, Robotics: Science and Systems, DOI [DOI 10.15607/RSS.2019.XV.011, 10.15607/RSS.2019.XV.011]

[10]

Heess N., 2017, Emergence of locomotion behaviours in rich environments

← 1 2 3 4 →