Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

被引:23
作者
Pang, Bo [1 ]
Jiang, Zhong-Ping [1 ]
机构
[1] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Stochastic processes; Stochastic systems; Optimal control; Heuristic algorithms; Reinforcement learning; Process control; Performance analysis; Adaptive optimal control; data-driven control; policy iteration; reinforcement learning; robustness; stochastic control; STABILITY; STATE;
D O I
10.1109/TAC.2022.3172250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
引用
收藏
页码:2383 / 2390
页数:8
相关论文
共 53 条
[1]  
A strom K.J., 1995, Adaptive Control, V2nd
[2]  
[Anonymous], 2007, Theory of Probability and Random Processes, DOI DOI 10.1098/rstb.2005.1673
[3]  
Basei M, 2022, Arxiv, DOI arXiv:2006.15316
[4]  
Bertsekas D., 2019, Reinforcement Learning and Optimal Control
[5]   Approximate policy iteration: A survey and some new methods [J].
Bertsekas D.P. .
Journal of Control Theory and Applications, 2011, 9 (3) :310-335
[6]  
Bhattacharya S, 2020, PR MACH LEARN RES, V155, P1814
[7]   CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING [J].
Bian, Tao ;
Jiang, Zhong-Ping .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) :4150-4174
[8]   Model-Free Robust Optimal Feedback Mechanisms of Biological Motor Control [J].
Bian, Tao ;
Wolpert, Daniel M. ;
Jiang, Zhong-Ping .
NEURAL COMPUTATION, 2020, 32 (03) :562-595
[9]   Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise [J].
Bian, Tao ;
Jiang, Yu ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) :4170-4175
[10]  
Bitmead RR, 1990, ADAPTIVE OPTIMAL CON