Adaptive Optimal Control of Unknown Nonlinear Systems via Homotopy-Based Policy Iteration

被引：7

作者：

Chen, Ci ^{[1
,2
]}

Lewis, Frank L. ^{[3
]}

Xie, Kan ^{[4
,5
]}

Xie, Shengli ^{[6
,7
]}

机构：

[1] Guangdong Univ Technol, Sch Automat, Guangdong Key Lab IoT Informat Technol, Guangzhou 510006, Peoples R China

[2] Minist Educ, Key Lab Intelligent Informat Proc & Syst Integrat, Guangzhou 510006, Peoples R China

[3] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX 76118 USA

[4] Guangdong Univ Technol, Sch Automat, Guangzhou 510006, Peoples R China

[5] 111 Ctr Intelligent Batch Mfg Based IoT Technol, Guangzhou 510006, Peoples R China

[6] Guangdong Univ Technol, Sch Automat, Key Lab Intelligent Detect & Internet Things Mfg, Guangzhou 510006, Peoples R China

[7] Guangdong Hong Kong Macao Joint Lab oratory Smart, Guangzhou, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2024年 / 69卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Adaptive optimal control; homotopic; initial admissible control; policy iteration (PI); reinforcement learning (RL); APPROXIMATION; DESIGN;

D O I：

10.1109/TAC.2023.3339660

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

As one efficient technique in reinforcement learning, policy iteration (PI) requires an initial admissible (or stabilizing for linear systems) control policy that renders the existing PI-based results to be model dependent. To attain a completely data-driven adaptive optimal control, this article suggests integrating a homotopic design with PI for unknown continuous-time nonlinear systems. Technically, we leverage a homotopic constant to construct an artificially stable system that allows zero control to initialize PI. Utilizing a homotopic strategy, we recursively update the artificial system and then enforce it to gradually recover the original system. This ultimately allows us to obtain an admissible control policy in a finite number of iterations without carrying out a model-based initialization. Once the admissible control is obtained, the proposed homotopic PI inherits fast convergence from the traditional PI technique and ensures learning the optimal control solution from the data measured from unknown nonlinear systems.

引用

页码：3396 / 3403

页数：8

共 46 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2]

[Anonymous], 1967, SIAM J. Control, V5, P54

[3] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[4] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[5]

Bertsekas D. P., 2020, Rollout, Policy Iteration, and Distributed Reinforcement Learning

[6] Value and Policy Iterations in Optimal Control and Adaptive Dynamic Programming [J].

Bertsekas, Dimitri P. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :500-509

[7]

Bertsekas DP., 1995, DYNAMIC PROGRAMMING

[8] Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach [J].

Bian, Tao ;

Jiang, Zhong-Ping .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) :2781-2790

[9] Value iteration and adaptive dynamic programming for data-driven adaptive optimal control design [J].

Bian, Tao ;

Jiang, Zhong-Ping .

AUTOMATICA, 2016, 71 :348-360

[10]

Broussard J. R., 1983, Proceedings of the 1983 American Control Conference, P1026

← 1 2 3 4 5 →