Approximate Dynamic Programming: Combining Regional and Local State Following Approximations

被引：23

作者：

Deptula, Patryk ^{[1
]}

Rosenfeld, Joel A. ^{[2
]}

Kamalapurkar, Rushikesh ^{[3
]}

Dixon, Warren E. ^{[1
]}

机构：

[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL 32611 USA

[2] Vanderbilt Univ, Dept Elect Engn & Comp Sci, 221 Kirkland Hall, Nashville, TN 37235 USA

[3] Oklahoma State Univ, Sch Mech & Aerosp Engn, Stillwater, OK 74074 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2018年 / 29卷 / 06期

基金：

美国国家科学基金会;

关键词：

Data-driven control; local estimation; nonlinear control; optimal control; reinforcement learning; CONTINUOUS-TIME; SYSTEMS; TRACKING; DESIGN;

D O I：

10.1109/TNNLS.2018.2808102

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An infinite-horizon optimal regulation problem for a control-affine deterministic system is solved online using a local state following (StaF) kernel and a regional model-based reinforcement learning (R-MBRL) method to approximate the value function. Unlike traditional methods such as R-MBRL that aim to approximate the value function over a large compact set, the StaF kernel approach aims to approximate the value function in a local neighborhood of the state that travels within a compact set. In this paper, the value function is approximated using a state-dependent convex combination of the StaF-based and the R-MBRL-based approximations. As the state enters a neighborhood containing the origin, the value function transitions from being approximated by the StaF approach to the R-MBRL approach. Semiglobal uniformly ultimately bounded (SGUUB) convergence of the system states to the origin is established using a Lyapunov-based analysis. Simulation results are provided for two, three, six, and ten-state dynamical systems to demonstrate the scalability and performance of the developed method.

引用

页码：2154 / 2166

页数：13

共 39 条

[1]

[Anonymous], 2010, THESIS

[2]

[Anonymous], 2003, CAMBRIDGE MONOGRAPHS

[3]

[Anonymous], 2015, Reinforcement Learning: An Introduction

[4]

[Anonymous], 2005, Cambridge Monograph, Applied Comput. Math.

[5] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[6] Exponential parameter and tracking error convergence guarantees for adaptive controllers without persistency of excitation [J].

Chowdhary, Girish ;

Muehlegg, Maximilian ;

Johnson, Eric .

INTERNATIONAL JOURNAL OF CONTROL, 2014, 87 (08) :1583-1603

[7] Optimal control of unknown affine nonlinear discrete-time systems using offline-trained neural networks with proof of convergence [J].

Dierks, Travis ;

Thumati, Balaje T. ;

Jagannathan, S. .

NEURAL NETWORKS, 2009, 22 (5-6) :851-860

[8] Event-Triggered Adaptive Dynamic Programming for Continuous-Time Systems With Control Constraints [J].

Dong, Lu ;

Zhong, Xiangnan ;

Sun, Changyin ;

He, Haibo .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (08) :1941-1952

[9] Reinforcement learning in continuous time and space [J].

Doya, K .

NEURAL COMPUTATION, 2000, 12 (01) :219-245

[10] Nearly optimal sliding mode fault-tolerant control for affine nonlinear systems with state constraints [J].

Fan, Quan-Yong ;

Yang, Guang-Hong .

NEUROCOMPUTING, 2016, 216 :78-88

← 1 2 3 4 →