Reinforcement learning via offline trajectory planning based on iteratively approximated models

被引：0

作者：

Pritzkoleit, Max ^{[1
]}

Heedt, Robert ^{[1
]}

Knoll, Carsten ^{[1
]}

Roebenack, Klaus ^{[1
]}

机构：

[1] Tech Univ Dresden, Fak Elektrotech & Informat Tech, Inst Regelungs & Steuerungstheorie, Dresden, Germany

来源：

AT-AUTOMATISIERUNGSTECHNIK | 2020年 / 68卷 / 08期

关键词：

trajectory planning; reinforcement learning; neural networks; model approximation; tracking control;

D O I：

10.1515/auto-2020-0024

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we use artificial neural networks (ANN) to approximate the dynamics of nonlinear (mechanical) systems. These iteratively approximated neural system models are used in an offline trajectory planning to calculate an optimal feedback law that is applied to the real system. This model-based reinforcement learning (RL) approach is evaluated on the swing-up manoeuvre of the cart-pole system and shows a significant performance gain in terms of data efficiency compared to model-free RL approaches. Furthermore, we show experimental results on a test bench. The proposed algorithm is capable of approximating an optimal feedback law for the system after only a few trials.

引用

页码：612 / 624

页数：13

共 29 条

[1] [Anonymous], 2016, DYNAMIC PROGRAMMING
[2] Bechtle S., 2019, ARXIV PREPRINT ARXIV
[3] Chua K., 2018, ADV NEURAL INFORM PR, P4754
[4] Deisenroth M., 2011, PILCO MODEL BASED DA, P465
[5] Deisenroth MP., 2013, FOUND TRENDS ROBOT, V2, P1, DOI DOI 10.1561/2300000021
[6] Farshidian F., 2015, ARXIV151207173CS
[7] Kaheman K., 2019, Learning Discrepancy Models From Experimental Data
[8] Lakshminarayanan B, 2017, ADV NEUR IN, V30
[9] Lee G., 2017, ARXIV PREPRINT ARXIV
[10] Lillicrap J. J., 2015, COMPUTER SCI

← 1 2 3 →