Online adaptive optimal control algorithm based on synchronous integral reinforcement learning with explorations

被引:7
作者
Guo, Lei [1 ]
Zhao, Han [1 ]
机构
[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Neural networks; Adaptive control; Actor; -critic; Explorations; TIME LINEAR-SYSTEMS; NONLINEAR-SYSTEMS;
D O I
10.1016/j.neucom.2022.11.055
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this study, we present a novel algorithm, based on synchronous policy iteration, to solve the continuous-time infinite-horizon optimal control problem of input affine system dynamics. The integral reinforcement is measured as an excitation signal to estimate the solution to the Hamilton-Jacobi-Bell man equation. In addition, the proposed method is completely model-free, that is, no a priori knowledge of the system is required. Using the adaptive tuning law, the actor and critic neural networks can simultaneously approximate the optimal value function and policy. The persistence of excitation condition is required to guarantee the convergence of the two networks. Unlike in traditional policy iteration algorithms, the restriction of the initial admissible policy was eliminated using this method. The effectiveness of the proposed algorithm is verified through numerical simulations. (c) 2022 Elsevier B.V. All rights reserved.
引用
收藏
页码:250 / 261
页数:12
相关论文
共 37 条
  • [1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    Abu-Khalaf, M
    Lewis, FL
    [J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
  • [2] [Anonymous], 1995, Optimal Control
  • [3] BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604
  • [4] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation
    Beard, RW
    Saridis, GN
    Wen, JT
    [J]. AUTOMATICA, 1997, 33 (12) : 2159 - 2177
  • [5] DYNAMIC PROGRAMMING
    BELLMAN, R
    [J]. SCIENCE, 1966, 153 (3731) : 34 - &
  • [6] Bertsekas DP, 1996, NEURO DYNAMIC PROGRA
  • [7] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    Bhasin, S.
    Kamalapurkar, R.
    Johnson, M.
    Vamvoudakis, K. G.
    Lewis, F. L.
    Dixon, W. E.
    [J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
  • [8] Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach
    Bian, Tao
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) : 2781 - 2790
  • [9] Bian T, 2016, IEEE DECIS CONTR P, P3375, DOI 10.1109/CDC.2016.7798777
  • [10] Choi Y., 2012, P 2012 INTERNATIONAL, P1