Model-free reinforcement learning from expert demonstrations: a survey

被引：0

作者：

Jorge Ramírez

Wen Yu

Adolfo Perrusquía

机构：

[1] CINVESTAV-IPN (National Polytechnic Institute),Departamento de Control Automático

[2] Cranfield University,School of Aerospace, Transport and Manufacturing

来源：

Artificial Intelligence Review | 2022年 / 55卷

关键词：

Reinforcement learning; Imitation learning; Learning from demonstrations; Behavioral learning; Demonstrations;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Reinforcement learning from expert demonstrations (RLED) is the intersection of imitation learning with reinforcement learning that seeks to take advantage of these two learning approaches. RLED uses demonstration trajectories to improve sample efficiency in high-dimensional spaces. RLED is a new promising approach to behavioral learning through demonstrations from an expert teacher. RLED considers two possible knowledge sources to guide the reinforcement learning process: prior knowledge and online knowledge. This survey focuses on novel methods for model-free reinforcement learning guided through demonstrations, commonly but not necessarily provided by humans. The methods are analyzed and classified according to the impact of the demonstrations. Challenges, applications, and promising approaches to improve the discussed methods are also discussed.

引用

页码：3213 / 3241

页数：28

共 104 条

[1]

Argall BD(2009)A survey of robot learning from demonstration Robot Auton Syst 57 469-483

[2]

Chernova S(2017)Deep reinforcement learning: a brief survey IEEE Signal Process Mag 34 26-38

[3]

Veloso M(2019)Team learning from human demonstration with coordination confidence Knowl Eng Rev 34 e12-1828

[4]

Browning B(1952)On the theory of dynamic programming Proc Natl Acad Sci USA 38 716-27

[5]

Arulkumaran K(2013)Representation learning: a review and new perspectives IEEE Trans Pattern Anal Mach Intell 35 1798-586

[6]

Deisenroth MP(2019)Active deep Q-learning with demonstration Mach Learn 109 1-1480

[7]

Brundage M(2021)First return, then explore Nature 590 580-574

[8]

Bharath AA(2015)A comprehensive survey on safe reinforcement learning J Mach Learn Res 16 1437-1218

[9]

Banerjee B(2003)Markov decision processes with delays and asynchronous cost collection IEEE Trans Autom Control 48 568-1274

[10]

Vittanala S(2020)Optical coherence tomography-guided robotic ophthalmic microsurgery via reinforcement learning from demonstration IEEE Trans Rob 36 1207-603

← 1 2 3 4 5 6 7 8 9 10 →