Approximate Dynamic Programming: Combining Regional and Local State Following Approximations

被引:23
作者
Deptula, Patryk [1 ]
Rosenfeld, Joel A. [2 ]
Kamalapurkar, Rushikesh [3 ]
Dixon, Warren E. [1 ]
机构
[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL 32611 USA
[2] Vanderbilt Univ, Dept Elect Engn & Comp Sci, 221 Kirkland Hall, Nashville, TN 37235 USA
[3] Oklahoma State Univ, Sch Mech & Aerosp Engn, Stillwater, OK 74074 USA
基金
美国国家科学基金会;
关键词
Data-driven control; local estimation; nonlinear control; optimal control; reinforcement learning; CONTINUOUS-TIME; SYSTEMS; TRACKING; DESIGN;
D O I
10.1109/TNNLS.2018.2808102
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.
引用
收藏
页码:2154 / 2166
页数:13
相关论文
共 39 条
[1]  
[Anonymous], 2010, THESIS
[2]  
[Anonymous], 2003, CAMBRIDGE MONOGRAPHS
[3]  
[Anonymous], 2015, Reinforcement Learning: An Introduction
[4]  
[Anonymous], 2005, Cambridge Monograph, Applied Comput. Math.
[5]   A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].
Bhasin, S. ;
Kamalapurkar, R. ;
Johnson, M. ;
Vamvoudakis, K. G. ;
Lewis, F. L. ;
Dixon, W. E. .
AUTOMATICA, 2013, 49 (01) :82-92
[6]   Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation [J].
Chowdhary, Girish ;
Muehlegg, Maximilian ;
Johnson, Eric .
INTERNATIONAL JOURNAL OF CONTROL, 2014, 87 (08) :1583-1603
[7]   Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence [J].
Dierks, Travis ;
Thumati, Balaje T. ;
Jagannathan, S. .
NEURAL NETWORKS, 2009, 22 (5-6) :851-860
[8]   Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints [J].
Dong, Lu ;
Zhong, Xiangnan ;
Sun, Changyin ;
He, Haibo .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) :1941-1952
[9]   Reinforcement learning in continuous time and space [J].
Doya, K .
NEURAL COMPUTATION, 2000, 12 (01) :219-245
[10]   Nearly optimal sliding mode fault-tolerant control for affine nonlinear systems with state constraints [J].
Fan, Quan-Yong ;
Yang, Guang-Hong .
NEUROCOMPUTING, 2016, 216 :78-88