A Tour of Reinforcement Learning: The View from Continuous Control

被引：376

作者：

Recht, Benjamin ^{[1
]}

机构：

[1] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA 94720 USA

来源：

ANNUAL REVIEW OF CONTROL, ROBOTICS, AND AUTONOMOUS SYSTEMS, VOL 2 | 2019年 / 2卷

关键词：

reinforcement learning; control theory; machine learning; optimization; STOCHASTIC-APPROXIMATION; SYSTEM-IDENTIFICATION; ALGORITHMS; COMPLEXITY; SAFE;

D O I：

10.1146/annurev-control-053018-023825

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article surveys reinforcement learning from the perspective of optimization and control, with a focus on continuous control applications. It reviews the general formulation, terminology, and typical experimental implementations of reinforcement learning as well as competing solution paradigms. In order to compare the relative merits of various techniques, it presents a case study of the linear quadratic regulator (LQR) with unknown dynamics, perhaps the simplest and best-studied problem in optimal control. It also describes how merging techniques from learning theory and control can provide nonasymptotic characterizations of LQR performance and shows that these characterizations tend to match experimental behavior. In turn, when revisiting more complex applications, many of the observed phenomena in LQR persist. In particular, theory and experiment demonstrate the role and importance of models and the cost of generality in reinforcement learning algorithms. The article concludes with a discussion of some of the challenges in designing learning systems that safely and reliably interact with complex and uncertain environments and how tools from reinforcement learning and control might be combined to approach these challenges.

引用

页码：253 / 279

页数：27

共 92 条

[1]

Abbasi-Yadkori Y, 2015, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P2

[2]

Abbasi-Yadkori Yasin, 2011, JMLR WORKSHOP C P, P1

[3]

Abeille M, 2017, PR MACH LEARN RES, V54, P176

[4]

Agarwal O., 2010, Colt, P28

[5]

Akametalu AK, 2014, IEEE DECIS CONTR P, P1424, DOI 10.1109/CDC.2014.7039601

[6]

[Anonymous], ROBOTICS SCI SYSTEMS

[7]

[Anonymous], 1991, Connectionist Models, DOI [DOI 10.1016/B978-1-4832-1448-1.50011-1, 10.1016/B978-1-4832-1448-1.50011-1]

[8]

[Anonymous], ARXIV180406021CSLG

[9]

[Anonymous], LIDSP2349 LAB INF DE

[10]

[Anonymous], ARXIV170804133CSLG

← 1 2 3 4 5 6 7 8 9 10 →