Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

被引:17
作者
Pang, Bo [1 ]
Jiang, Zhong-Ping [1 ]
机构
[1] NYU, Tandon Sch Engn, Dept Elect & Comp Engn, Control & Networks Lab, Brooklyn, NY 11201 USA
基金
美国国家科学基金会;
关键词
Stochastic processes; Stochastic systems; Optimal control; Heuristic algorithms; Reinforcement learning; Process control; Performance analysis; Adaptive optimal control; data-driven control; policy iteration; reinforcement learning; robustness; stochastic control; STABILITY; STATE;
D O I
10.1109/TAC.2022.3172250
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
引用
收藏
页码:2383 / 2390
页数:8
相关论文
共 53 条
  • [1] A strom K.J., 1995, Adaptive Control, V2nd
  • [2] [Anonymous], 1993, P 5 C NEUR INF PROC
  • [3] Basei M, 2022, Arxiv, DOI arXiv:2006.15316
  • [4] Bertsekas D., 2019, Reinforcement learning and optimal control
  • [5] Approximate policy iteration: A survey and some new methods
    Bertsekas D.P.
    [J]. Journal of Control Theory and Applications, 2011, 9 (3): : 310 - 335
  • [6] Bhattacharya S, 2020, PR MACH LEARN RES, V155, P1814
  • [7] CONTINUOUS-TIME ROBUST DYNAMIC PROGRAMMING
    Bian, Tao
    Jiang, Zhong-Ping
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2019, 57 (06) : 4150 - 4174
  • [8] Model-Free Robust Optimal Feedback Mechanisms of Biological Motor Control
    Bian, Tao
    Wolpert, Daniel M.
    Jiang, Zhong-Ping
    [J]. NEURAL COMPUTATION, 2020, 32 (03) : 562 - 595
  • [9] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise
    Bian, Tao
    Jiang, Yu
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) : 4170 - 4175
  • [10] Bitmead R. R., 1990, Adaptive Optimal Control the Thinking Man's GPC