Adaptive optimal control for a class of continuous-time affine nonlinear systems with unknown internal dynamics

被引:77
作者
Liu, Derong [1 ]
Yang, Xiong [1 ]
Li, Hongliang [1 ]
机构
[1] Chinese Acad Sci, Inst Automat, State Key Lab Management & Control Complex Syst, Beijing 100190, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; Reinforcement learning; Policy iteration; Adaptive optimal control; Neural network; Online control; Nonlinear system; APPROXIMATION;
D O I
10.1007/s00521-012-1249-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper develops an online algorithm based on policy iteration for optimal control with infinite horizon cost for continuous-time nonlinear systems. In the present method, a discounted value function is employed, which is considered to be a more general case for optimal control problems. Meanwhile, without knowledge of the internal system dynamics, the algorithm can converge uniformly online to the optimal control, which is the solution of the modified Hamilton-Jacobi-Bellman equation. By means of two neural networks, the algorithm is able to find suitable approximations of both the optimal control and the optimal cost. The uniform convergence to the optimal control is shown, guaranteeing the stability of the nonlinear system. A simulation example is provided to illustrate the effectiveness and applicability of the present approach.
引用
收藏
页码:1843 / 1850
页数:8
相关论文
共 24 条
[11]   A Novel Generalized Value Iteration Scheme For Uncertain Continuous-Time Linear Systems [J].
Lee, Jae Young ;
Park, Jin Bae ;
Choi, Yoon Ho .
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, :4637-4642
[12]  
Lewis F., 1995, Optimal control
[13]   Reinforcement Learning for Partially Observable Dynamic Processes: Adaptive Dynamic Programming Using Measured Output Data [J].
Lewis, F. L. ;
Vamvoudakis, Kyriakos G. .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2011, 41 (01) :14-25
[14]   Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control [J].
Lewis, Frank L. ;
Vrabie, Draguna .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2009, 9 (03) :32-50
[15]  
Liu DR, 2011, 2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), P53, DOI 10.1109/IJCNN.2011.6033199
[16]   Iterative solution of algebraic Riccati equations for damped systems [J].
Morris, Kirsten ;
Navasca, Carmeliza .
PROCEEDINGS OF THE 45TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-14, 2006, :2436-2440
[17]   Adaptive dynamic programming [J].
Murray, JJ ;
Cox, CJ ;
Lendaris, GG ;
Saeks, R .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2002, 32 (02) :140-153
[18]  
Pontryagin L.S., 1959, USP MAT NAUK, V14, P3
[19]  
Rudin W., 1976, INT SERIES PURE APPL
[20]   APPROXIMATION THEORY OF OPTIMAL-CONTROL FOR TRAINABLE MANIPULATORS [J].
SARIDIS, GN .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1979, 9 (03) :152-159