Data-Driven Policy Iteration for Nonlinear Optimal Control Problems

被引：4

作者：

Possieri, Corrado ^{[1
]}

Sassano, Mario ^{[2
]}

机构：

[1] CNR, Ist Anal Sistemi Informat A Ruberti, I-00185 Rome, Italy

[2] Univ Roma Tor Vergata, Dipartimento Ingn Civile & Ingn Informat, I-00133 Rome, Italy

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 10期

关键词：

Optimal control; Costs; Neural networks; Real-time systems; Nonlinear dynamical systems; Closed loop systems; Learning systems; Data-driven methods; nonlinear systems; optimal control; policy iteration; ADAPTIVE OPTIMAL-CONTROL; TIME LINEAR-SYSTEMS; IDENTIFICATION; ALGORITHM; DESIGN;

D O I：

10.1109/TNNLS.2022.3142501

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The design of optimal control laws for nonlinear systems is tackled without knowledge of the underlying plant and of a functional description of the cost function. The proposed data-driven method is based only on real-time measurements of the state of the plant and of the (instantaneous) value of the reward signal and relies on a combination of ideas borrowed from the theories of optimal and adaptive control problems. As a result, the architecture implements a policy iteration strategy in which, hinging on the use of neural networks, the policy evaluation step and the computation of the relevant information instrumental for the policy improvement step are performed in a purely continuous-time fashion. Furthermore, the desirable features of the design method, including convergence rate and robustness properties, are discussed. Finally, the theory is validated via two benchmark numerical simulations.

引用

页码：7365 / 7376

页数：12

共 48 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2]

Adams R.A., 2003, Sobolev Spaces, Vsecond

[3]

[Anonymous], 2011, Adaptive Control: Stability, Convergence and Robustness

[4]

Athans Michael., 1966, OPTIMAL CONTROL

[5]

BAIRD LC, 1994, 1994 IEEE INTERNATIONAL CONFERENCE ON NEURAL NETWORKS, VOL 1-7, P2448, DOI 10.1109/ICNN.1994.374604

[6] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[7]

Berkovitz L.D., 2019, Nonlinear optimal control theory

[8] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[9] KRONECKER PRODUCTS AND MATRIX CALCULUS IN SYSTEM THEORY [J].

BREWER, JW .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS, 1978, 25 (09) :772-781

[10]

Dierks T, 2010, P AMER CONTR CONF, P1568

← 1 2 3 4 5 →