Survey of Model-Based Reinforcement Learning: Applications on Robotics

被引：351

作者：

Polydoros, Athanasios S. ^{[1
]}

Nalpantidis, Lazaros ^{[1
]}

机构：

[1] Aalborg Univ, Dept Mech & Mfg Engn, AC Meyers Vaenge 15, DK-2450 Copenhagen SV, Denmark

来源：

JOURNAL OF INTELLIGENT & ROBOTIC SYSTEMS | 2017年 / 86卷 / 02期

关键词：

Intelligent robotics; Machine learning; Model-based reinforcement learning; Robot learning; Policy search; Transition models; Reward functions; EXPLORATION;

D O I：

10.1007/s10846-017-0468-y

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning is an appealing approach for allowing robots to learn new tasks. Relevant literature reveals a plethora of methods, but at the same time makes clear the lack of implementations for dealing with real life challenges. Current expectations raise the demand for adaptable robots. We argue that, by employing model-based reinforcement learning, the-now limited-adaptability characteristics of robotic systems can be expanded. Also, model-based reinforcement learning exhibits advantages that makes it more applicable to real life use-cases compared to model-free methods. Thus, in this survey, model-based methods that have been applied in robotics are covered. We categorize them based on the derivation of an optimal policy, the definition of the returns function, the type of the transition model and the learned task. Finally, we discuss the applicability of model-based reinforcement learning approaches in new applications, taking into consideration the state of the art in both algorithms and hardware.

引用

页码：153 / 173

页数：21

共 94 条

[1] Autonomous Helicopter Aerobatics through Apprenticeship Learning [J].

Abbeel, Pieter ;

Coates, Adam ;

Ng, Andrew Y. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2010, 29 (13) :1608-1639

[2]

Albus J. S., 1975, Transactions of the ASME. Series G, Journal of Dynamic Systems, Measurement and Control, V97, P220, DOI 10.1115/1.3426922

[3] Natural gradient works efficiently in learning [J].

Amari, S .

NEURAL COMPUTATION, 1998, 10 (02) :251-276

[4]

Anderson B. D., 2007, OPTIMAL CONTROL LINE

[5]

Andersson O., 2015, 29 AAAI C ART INT AA

[6]

[Anonymous], 2012, P 29 INT COF INT C M

[7]

[Anonymous], 2013, Advances in Neural Information Processing Systems

[8]

[Anonymous], 2006, P 23 INT C MACH LEAR, DOI DOI 10.1145/1143844.1143845

[9]

[Anonymous], 2007, P AAAI

[10]

[Anonymous], 2009, P 11 ANN C GEN EV CO

← 1 2 3 4 5 6 7 8 9 10 →