Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

被引：23

作者：

Pang, Bo ^{[1
]}

Jiang, Zhong-Ping ^{[1
]}

机构：

[1] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab, Brooklyn, NY 11201 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2023年 / 68卷 / 04期

基金：

美国国家科学基金会;

关键词：

Stochastic processes; Stochastic systems; Optimal control; Heuristic algorithms; Reinforcement learning; Process control; Performance analysis; Adaptive optimal control; data-driven control; policy iteration; reinforcement learning; robustness; stochastic control; STABILITY; STATE;

D O I：

10.1109/TAC.2022.3172250

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.

引用

页码：2383 / 2390

页数：8

共 53 条

[1]

A strom K.J., 1995, Adaptive Control, V2nd

[2]

[Anonymous], 2007, Theory of Probability and Random Processes, DOI DOI 10.1098/rstb.2005.1673

[3]

Basei M, 2022, Arxiv, DOI arXiv:2006.15316

[4]

Bertsekas D., 2019, Reinforcement Learning and Optimal Control

[5] Approximate policy iteration: A survey and some new methods [J].

Bertsekas D.P. .

Journal of Control Theory and Applications, 2011, 9 (3) :310-335

[6]

Bhattacharya S, 2020, PR MACH LEARN RES, V155, P1814

[7] CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING [J].

Bian, Tao ;

Jiang, Zhong-Ping .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) :4150-4174

[8] Model-Free Robust Optimal Feedback Mechanisms of Biological Motor Control [J].

Bian, Tao ;

Wolpert, Daniel M. ;

Jiang, Zhong-Ping .

NEURAL COMPUTATION, 2020, 32 (03) :562-595

[9] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].

Bian, Tao ;

Jiang, Yu ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175

[10]

Bitmead RR, 1990, ADAPTIVE OPTIMAL CON

← 1 2 3 4 5 6 →