Approximate Dynamic Programming: Combining Regional and Local State Following Approximations

被引:21
作者
Deptula, Patryk [1 ]
Rosenfeld, Joel A. [2 ]
Kamalapurkar, Rushikesh [3 ]
Dixon, Warren E. [1 ]
机构
[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL 32611 USA
[2] Vanderbilt Univ, Dept Elect Engn & Comp Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[3] Oklahoma State Univ, Sch Mech & Aerosp Engn, Stillwater, OK 74074 USA
基金
美国国家科学基金会;
关键词
Data-driven control; local estimation; nonlinear control; optimal control; reinforcement learning; CONTINUOUS-TIME; SYSTEMS; TRACKING; DESIGN;
D O I
10.1109/TNNLS.2018.2808102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.
引用
收藏
页码:2154 / 2166
页数:13
相关论文
共 39 条
  • [1] [Anonymous], 2010, THESIS
  • [2] [Anonymous], 2003, CAMBRIDGE MONOGRAPHS
  • [3] [Anonymous], 2015, Reinforcement Learning: An Introduction
  • [4] [Anonymous], 2005, Cambridge Monograph, Applied Comput. Math.
  • [5] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems
    Bhasin, S.
    Kamalapurkar, R.
    Johnson, M.
    Vamvoudakis, K. G.
    Lewis, F. L.
    Dixon, W. E.
    [J]. AUTOMATICA, 2013, 49 (01) : 82 - 92
  • [6] Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation
    Chowdhary, Girish
    Muehlegg, Maximilian
    Johnson, Eric
    [J]. INTERNATIONAL JOURNAL OF CONTROL, 2014, 87 (08) : 1583 - 1603
  • [7] Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence
    Dierks, Travis
    Thumati, Balaje T.
    Jagannathan, S.
    [J]. NEURAL NETWORKS, 2009, 22 (5-6) : 851 - 860
  • [8] Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints
    Dong, Lu
    Zhong, Xiangnan
    Sun, Changyin
    He, Haibo
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) : 1941 - 1952
  • [9] Reinforcement learning in continuous time and space
    Doya, K
    [J]. NEURAL COMPUTATION, 2000, 12 (01) : 219 - 245
  • [10] Nearly optimal sliding mode fault-tolerant control for affine nonlinear systems with state constraints
    Fan, Quan-Yong
    Yang, Guang-Hong
    [J]. NEUROCOMPUTING, 2016, 216 : 78 - 88