Learning Optimal Control Policy for Unknown Discrete-Time Systems

被引：2

作者：

Lai, Jing ^{[1
]}

Xiong, Junlin ^{[1
]}

机构：

[1] Univ Sci & Technol China, Dept Automat, Hefei 230026, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS | 2023年 / 70卷 / 11期

关键词：

Model-free; stabilizing control; data-driven; reinforcement learning; ITERATION;

D O I：

10.1109/TCSII.2023.3279309

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This brief studies the optimal control policy learning problem for discrete-time linear systems. A data-driven model-free algorithm is proposed by using the data matrices of the augmented system state and the increasing of the discount factor. The control gains generated by the proposed algorithm are proven to converge to the optimal one. Compared with the existing work, our model-free algorithm avoids the dependence on initial stabilizing control policy and the use of Kronecker product. Some numerical examples are provided to illustrate the proposed algorithm and analysis results.

引用

页码：4191 / 4195

页数：5

共 22 条

[1] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[2] Homotopic policy iteration-based learning design for unknown linear continuous-time systemsx2729; [J].

Chen, Ci ;

Lewis, Frank L. ;

Li, Bo .

AUTOMATICA, 2022, 138

[3]

Feng H, 2020, P AMER CONTR CONF, P50, DOI [10.23919/acc45564.2020.9147961, 10.23919/ACC45564.2020.9147961]

[4] Resilient reinforcement learning and robust output regulation under denial-of-service attacks [J].

Gao, Weinan ;

Deng, Chao ;

Jiang, Yi ;

Jiang, Zhong-Ping .

AUTOMATICA, 2022, 142

[5] Discounted Iterative Adaptive Critic Designs With Novel Stability Analysis for Tracking Control [J].

Ha, Mingming ;

Wang, Ding ;

Liu, Derong .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2022, 9 (07) :1262-1272

[6] Direct Adaptive Optimal Control for Uncertain Continuous-Time LTI Systems Without Persistence of Excitation [J].

Jha, Sumit Kumar ;

Roy, Sayan Basu ;

Bhasin, Shubhendu .

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2018, 65 (12) :1993-1997

[7] Modified general policy iteration based adaptive dynamic programming for unknown discrete-time linear systems [J].

Jiang, Huaiyuan ;

Zhou, Bin ;

Duan, Guang-Ren .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2022, 32 (12) :7149-7173

[8] Optimal and Autonomous Control Using Reinforcement Learning: A Survey [J].

Kiumarsi, Bahare ;

Vamvoudakis, Kyriakos G. ;

Modares, Hamidreza ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) :2042-2062

[9] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics [J].

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Modares, Hamidreza ;

Karimpour, Ali ;

Naghibi-Sistani, Mohammad-Bagher .

AUTOMATICA, 2014, 50 (04) :1167-1175

[10] Kinodynamic Motion Planning With Continuous-Time Q-Learning: An Online, Model-Free, and Safe Navigation Framework [J].

Kontoudis, George P. ;

Vamvoudakis, Kyriakos G. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (12) :3803-3817

← 1 2 3 →