Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives

被引：73

作者：

Tamosiunaite, Minija ^{[1
,2
]}

Nemec, Bojan ^{[3
]}

Ude, Ales ^{[3
]}

Woergoetter, Florentin ^{[1
]}

机构：

[1] Univ Gottingen, Inst Phys Biophys 3, Bernstein Ctr Computat Neurosci, D-37077 Gottingen, Germany

[2] Vytautas Magnus Univ, Dept Informat, LT-44404 Kaunas, Lithuania

[3] Jozef Stefan Inst, Dept Automat Biocybernet & Robot, Ljubljana 1000, Slovenia

来源：

ROBOTICS AND AUTONOMOUS SYSTEMS | 2011年 / 59卷 / 11期

关键词：

Reinforcement learning; PI2-method; Natural actor critic; Value function approximation; Dynamic movement primitives;

D O I：

10.1016/j.robot.2011.07.004

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

When describing robot motion with dynamic movement primitives (DMPs), goal (trajectory endpoint), shape and temporal scaling parameters are used. In reinforcement learning with DMPs, usually goals and temporal scaling parameters are predefined and only the weights for shaping a DMP are learned. Many tasks, however, exist where the best goal position is not a priori known, requiring to learn it. Thus, here we specifically address the question of how to simultaneously combine goal and shape parameter learning. This is a difficult problem because learning of both parameters could easily interfere in a destructive way. We apply value function approximation techniques for goal learning and direct policy search methods for shape learning. Specifically, we use "policy improvement with path integrals" and "natural actor critic" for the policy search. We solve a learning-to-pour-liquid task in simulations as well as using a Pa10 robot arm. Results for learning from scratch, learning initialized by human demonstration, as well as for modifying the tool for the learned DMPs are presented. We observe that the combination of goal and shape learning is stable and robust within large parameter regimes. Learning converges quickly even in the presence of disturbances, which makes this combined method suitable for robotic applications. (C) 2011 Elsevier B.V. All rights reserved.

引用

页码：910 / 922

页数：13

共 26 条

[1]

[Anonymous], NEURAL COMPUTA UNPUB

[2]

[Anonymous], P 14 INT C MACH LEAR

[3]

[Anonymous], CFR VDP

[4]

[Anonymous], ROBOTICS

[5]

[Anonymous], UK WORKSH COMP INT U

[6]

[Anonymous], ARXIVCS0609140V2

[7]

[Anonymous], 2009, 2009 IEEE INT C ROB

[8]

[Anonymous], ADV NEURAL INF PROCE

[9]

[Anonymous], INT S ROB RES ISRR20

[10] Infinite-horizon policy-gradient estimation [J].

Baxter, J ;

Bartlett, PL .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 :319-350

← 1 2 3 →