Deep Off-Policy Iterative Learning Control

被引：0

作者：

Gurumurthy, Swaminathan ^{[1
]}

Kolter, J. Zico ^{[1
,2
]}

Manchester, Zachary ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

[2] Bosch Ctr AI, Sunnyvale, CA USA

来源：

LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211 | 2023年 / 211卷

关键词：

Reinforcement Learning; Iterative Learning Control; Differentiable Simulators;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning has emerged as a powerful paradigm to learn control policies while making few assumptions about the environment. However, this lack of assumptions in popular RL algorithms also leads to sample inefficiency. Furthermore, we often have access to a simulator that can provide approximate gradients for the rewards and dynamics of the environment. Iterative learning control (ILC) approaches have been shown to be very efficient at learning policies by using approximate simulator gradients to speed up optimization. However, they lack the generality of reinforcement learning approaches. In this paper, we take inspiration from ILC and propose an update equation for the value-function gradients (computed using the dynamics Jacobians and reward gradient obtained from an approximate simulator) to speed up value-function and policy optimization. We add this update to an off-the-shelf off-policy reinforcement learning algorithm and demonstrate that using the value-gradient update leads to a significant improvement in sample efficiency (and sometimes better performance) both when learning from scratch in a new environment and while fine-tuning a pre-trained policy in a new environment. Moreover, we observe that policies pretrained in the simulator using the simulator jacobians obtain better zero-shot transfer performance and adapt much faster in a new environment.

引用

页数：13

共 39 条

[1] Abbeel P ..., 2006, P 23 INT C MACH LEAR, P1
[2] Agarwal Naman, 2021, INT C MACHINE LEARNI, V139, P100
[3] [Anonymous], 2012, Iterative learning control for deterministic systems
[4] BETTERING OPERATION OF ROBOTS BY LEARNING
ARIMOTO, S
KAWAMURA, S
MIYAZAKI, F
[J]. JOURNAL OF ROBOTIC SYSTEMS, 1984, 1 (02): : 123 - 140
[5] Berner C., 2019, arXiv
[6] Bertsekas D., 2012, DYNAMIC PROGRAMMING, V1
[7] Bertsekas DP, 1982, Athena Scientific optimization and computation series
[8] Betts J.T., 2001, ADV DESIGN CONTROL
[9] Chen T, 2021, PR MACH LEARN RES, V164, P297
[10] Gillen S, 2020, IEEE DECIS CONTR P, P4129, DOI 10.1109/CDC42340.2020.9303878

← 1 2 3 4 →