Policy Learning with an Effcient Black-Box Optimization Algorithm

被引：1

作者：

Hwangbo, Jemin ^{[1
]}

Gehring, Christian ^{[1
]}

Sommer, Hannes ^{[1
]}

Siegwart, Roland ^{[1
]}

Buchli, Jonas ^{[2
]}

机构：

[1] Swiss Fed Inst Technol, Autonomous Syst Lab, Zurich, Switzerland

[2] Swiss Fed Inst Technol, Agile & Dexterous Robot Lab, Zurich, Switzerland

来源：

INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS | 2015年 / 12卷 / 03期

基金：

瑞士国家科学基金会;

关键词：

Policy optimization; robotic learning; black-box optimization;

D O I：

10.1142/S0219843615500292

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Robotic learning on real hardware requires an efficient algorithm which minimizes the number of trials needed to learn an optimal policy. Prolonged use of hardware causes wear and tear on the system and demands more attention from an operator. To this end, we present a novel black-box optimization algorithm, Reward Optimization with Compact Kernels and fast natural gradient regression (ROCK*). Our algorithm immediately updates knowledge after a single trial and is able to extrapolate in a controlled manner. These features make fast and safe learning on real hardware possible. The performance of our method is evaluated with standard benchmark functions that are commonly used to test optimization algorithms. We also present three differerent robotic optimization examples using ROCK*. The first robotic example is on a simulated robot arm, the second is on a real articulated legged system, and the third is on a simulated quadruped robot with 12 actuated joints. ROCK* outperforms the current state-of-the- art algorithms in all tasks sometimes even by an order of magnitude.

引用

页数：20

共 28 条

[1]

Amari S, 1998, INT CONF ACOUST SPEE, P1213, DOI 10.1109/ICASSP.1998.675489

[2]

[Anonymous], 2013, Paladyn, Journal of Behavioral Robotics

[3]

[Anonymous], 2002, P ADV NEURAL INFORM

[4]

[Anonymous], 2009, P 11 ANN C COMPANION, DOI [DOI 10.1145/1570256, 10.1145/ 1570256.1570333, DOI 10.1145/1570256.1570333, 10.1145/1570256.1570333]

[5]

Bosman PAN, 2008, LECT NOTES COMPUT SC, V5199, P133, DOI 10.1007/978-3-540-87700-4_14

[6] Learning variable impedance control [J].

Buchli, Jonas ;

Stulp, Freek ;

Theodorou, Evangelos ;

Schaal, Stefan .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2011, 30 (07) :820-833

[7]

Fankhauser P, 2013, IEEE INT C INT ROBOT, P188, DOI 10.1109/IROS.2013.6696352

[8]

Gehring C, 2014, IEEE INT C ROB AUT H

[9]

Gehring C, 2013, IEEE INT CONF ROBOT, P3287, DOI 10.1109/ICRA.2013.6631035

[10] Completely derandomized self-adaptation in evolution strategies [J].

Hansen, N ;

Ostermeier, A .

EVOLUTIONARY COMPUTATION, 2001, 9 (02) :159-195

← 1 2 3 →