Modified λ-Policy Iteration Based Adaptive Dynamic Programming for Unknown Discrete-Time Linear Systems

被引：6

作者：

Jiang, Huaiyuan ^{[1
]}

Zhou, Bin ^{[1
]}

Duan, Guang-Ren ^{[1
]}

机构：

[1] Harbin Inst Technol, Ctr Control Theory & Guidance Technol, Harbin 150001, Peoples R China

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Adaptive dynamic programming (ADP); data-driven control; discrete-time systems; modified 1-policy iteration (1-PI); policy iteration; unknown systems; STABILIZATION;

D O I：

10.1109/TNNLS.2023.3244934

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

this article, the 1-policy iteration (1-PI) method for the optimal control problem of discrete-time linear systems is reconsidered and restated from a novel aspect. First, the traditional 1-PI method is recalled, and some new properties of the traditional 1-PI are proposed. Based on these new properties, a modified 1-PI algorithm is introduced with its convergence proven. Compared with the existing results, the initial con-dition is further relaxed. The data-driven implementation is then constructed with a new matrix rank condition for veri-fying the feasibility of the proposed data-driven implementation. A simulation example verifies the effectiveness of the proposed method.

引用

页码：3291 / 3301

页数：11

共 52 条

[1] Adda J, 2003, DYNAMIC ECONOMICS: QUANTITATIVE METHODS AND APPLICATIONS, P1
[2] The Boundedness Conditions for Model-Free HDP(lambda)
Al-Dabooni, Seaar
Wunsch, Donald
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (07) : 1928 - 1942
[3] DYNAMIC PROGRAMMING
BELLMAN, R
[J]. SCIENCE, 1966, 153 (3731) : 34 - &
[4] Bertsekas D.P., 2005, Dynamic Programming and Optimal Control, VI
[5] Bertsekas Dimitri P, 1996, Report LIDSP-2349, P14
[6] Adaptive Dynamic Programming for Stochastic Systems With State and Control Dependent Noise
Bian, Tao
Jiang, Yu
Jiang, Zhong-Ping
[J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2016, 61 (12) : 4170 - 4175
[7] Chakrabarty A, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P524, DOI [10.23919/ECC.2019.8795815, 10.23919/ecc.2019.8795815]
[8] Homotopic policy iteration-based learning design for unknown linear continuous-time systemsx2729;
Chen, Ci
Lewis, Frank L.
Li, Bo
[J]. AUTOMATICA, 2022, 138
[9] Off-policy learning for adaptive optimal output synchronization of heterogeneous multi-agent systems
Chen, Ci
Lewis, Frank L.
Xie, Kan
Xie, Shengli
Liu, Yilu
[J]. AUTOMATICA, 2020, 119
[10] Stability and monotone convergence of generalised policy iteration for discrete-time linear quadratic regulations
Chun, Tae Yoon
Lee, Jae Young
Park, Jin Bae
Choi, Yoon Ho
[J]. INTERNATIONAL JOURNAL OF CONTROL, 2016, 89 (03) : 437 - 450

← 1 2 3 4 5 6 →