A reinforcement learning-based scheme for direct adaptive optimal control of linear stochastic systems

被引：14

作者：

Wong, Wee Chin ^{[1
]}

Lee, Jay H. ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch Chem & Biomol Engn, Atlanta, GA 30332 USA

来源：

OPTIMAL CONTROL APPLICATIONS & METHODS | 2010年 / 31卷 / 04期

关键词：

reinforcement learning; linear systems; stochastic adaptive optimal control; IDENTIFICATION;

D O I：

10.1002/oca.915

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Reinforcement learning where decision-making agents learn optimal policies through environmental interactions is an attractive paradigm for model-free, adaptive controller design. However, results for systems with continuous state and action variables are rare. In this paper, we present convergence results for optimal linear quadratic control of discrete-time linear stochastic systems. This work can be viewed as a generalization of a previous work on deterministic linear systems. Key differences between the algorithms for deterministic and stochastic systems are highlighted through examples. The usefulness of the algorithm is demonstrated through a nonlinear chemostat bioreactor case study Copyright (C) 2009 John Wiley & Sons, Ltd.

引用

页码：365 / 374

页数：10

共 19 条

[1] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
Al-Tamimi, Asma
Lewis, Frank L.
Abu-Khalaf, Murad
[J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
[2] [Anonymous], 2000, Dynamic programming and optimal control
[3] Bellman R. E., 1957, Dynamic programming. Princeton landmarks in mathematics
[4] BRADKTE S, 1994, THESIS U MASSACHUSET
[5] BRADTKE SJ, 1994, PROCEEDINGS OF THE 1994 AMERICAN CONTROL CONFERENCE, VOLS 1-3, P3475
[6] Christopher John Cornish Hellaby Watkins, 1989, Learning from Delayed Rewards
[7] Goodwin G. C., 1984, Adaptive filtering prediction and control
[8] Effect of process nonlinearity on linear quadratic regulator performance
Guay, M
Dier, R
Hahn, J
McLellan, PJ
[J]. JOURNAL OF PROCESS CONTROL, 2005, 15 (01) : 113 - 124
[9] HAGEN S, 2001, THESIS U AMSTERDAM
[10] Adaptive critic methods for stochastic systems with input-dependent noise
Herzallah, Randa
[J]. AUTOMATICA, 2007, 43 (08) : 1355 - 1362

← 1 2 →