Bias-policy iteration based adaptive dynamic programming for unknown continuous-time linear systems

被引:27
作者
Jiang, Huaiyuan [1 ]
Zhou, Bin [1 ]
机构
[1] Harbin Inst Technol, Ctr Control Theory & Guidance Technol, POB 416, Harbin 150001, Peoples R China
关键词
Adaptive dynamic programming; Policy iteration; Unknown systems; Optimal control; Data-driven control; DESIGN;
D O I
10.1016/j.automatica.2021.110058
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, a bias-policy iteration method for solving the data-driven optimal control problem of unknown continuous-time linear systems is proposed. Firstly, a model-based bias-policy iteration method is given and its convergence is rigorously proved. Then the data-driven implementation for the proposed method is then introduced without using the information of the system matrices. The relationship between the proposed method and the existing policy iteration method and value iteration method is also analyzed. Compared with the existing policy iteration method, the most significant advantage of the proposed method is that, by adding a bias parameter, the condition of the initial admissible controllers can be further relaxed. Simulation examples verify the effectiveness of the proposed bias-policy iteration method. (C) 2021 Elsevier Ltd. All rights reserved.
引用
收藏
页数:12
相关论文
共 52 条
[1]   Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].
Abu-Khalaf, M ;
Lewis, FL .
AUTOMATICA, 2005, 41 (05) :779-791
[2]   Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].
Al-Tamimi, Asma ;
Lewis, Frank L. ;
Abu-Khalaf, Murad .
AUTOMATICA, 2007, 43 (03) :473-481
[3]   Stability of stochastic approximation under verifiable conditions [J].
Andrieu, C ;
Moulines, É ;
Priouret, P .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2005, 44 (01) :283-312
[4]  
[Anonymous], 1974, Ph.D. thesis
[5]   Approximate policy iteration: A survey and some new methods [J].
Bertsekas D.P. .
Journal of Control Theory and Applications, 2011, 9 (3) :310-335
[6]  
Bertsekas D. P., 2011, LIDSP2874 MIT
[7]  
Bertsekas Dimitri P., 2011, Dynamic Programming and Optimal Control, VII
[8]   Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach [J].
Bian, Tao ;
Jiang, Zhong-Ping .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) :2781-2790
[9]   Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].
Bian, Tao ;
Jiang, Zhong-Ping .
AUTOMATICA, 2016, 71 :348-360
[10]  
Bittanti S., 2012, RICCATI EQUATION