Safe Reinforcement Learning for Zero-Sum Games of Hypersonic Flight Vehicles

被引：0

作者：

Shi, Lei ^{[1
,2
,3
]}

Wang, Xuesong ^{[1
,2
,3
]}

Cheng, Yuhu ^{[1
,2
,3
]}

机构：

[1] China Univ Min & Technol, Engn Res Ctr Intelligent Control Underground Space, Minist Educ, Xuzhou 221116, Peoples R China

[2] China Univ Min & Technol, Xuzhou Key Lab Artificial Intelligence & Big Data, Xuzhou 221116, Peoples R China

[3] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2025年 / 74卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Power capacitors; Aerodynamics; Vehicle dynamics; Safety; Mathematical models; Game theory; Reinforcement learning; Games; Elevators; Stability criteria; Safe reinforcement learning; hypersonic flight vehicle; zero-sum game; actor-critic-disturbance; ADAPTIVE-CONTROL; SYSTEMS; MODEL;

D O I：

10.1109/TVT.2024.3426326

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

This article presents a safe reinforcement learning algorithm for the zero-sum game (ZSG) problem of hypersonic flight vehicles within the actor-critic-disturbance framework. Initially, a system transformation on the basis of barrier-function is suggested in which an original safe control issue with full-state constraints is transformed into an equivalent unconstrained optimization problem. Then, an actor-critic-disturbance structure for adaptive optimal learning is proposed to solve the ZSG issue online while assuring safety and stability. Furthermore, based on experience replay technique, novel learning rules for network weights are presented, which not only enable the convergence process of network weights to be more stable, but also accelerate the convergence speed. Thereafter, the stability of the closed-loop system and uniform ultimate boundedness of weight estimation errors are demonstrated by the Lyapunov method. Ultimately, a simulation example is executed to verify the efficiency of the suggested safe reinforcement learning algorithm.

引用

页码：191 / 200

页数：10

共 34 条

[1] Barrier Lyapunov function-based adaptive control for hypersonic flight vehicles
An, Hao
Xia, Hongwei
Wang, Changhong
[J]. NONLINEAR DYNAMICS, 2017, 88 (03) : 1833 - 1853
[2] Basar T., 1995, Dynamic Noncooperative Game Theory
[3] Nonlinear longitudinal dynamical model of an air-breathing hypersonic vehicle
Bolender, Michael A.
Doman, David B.
[J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2007, 44 (02) : 374 - 387
[4] Finite-time nonsingular terminal sliding mode control-based fuzzy smooth-switching coordinate strategy for AHV-VGI
Dou, Liqian
Du, Miaomiao
Mao, Qi
Zong, Qun
[J]. AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 106
[5] Asymptotic tracking control for constrained nonstrict-feedback MIMO nonlinear systems via parameter compensations
Du, Peihao
Pan, Yingnan
Chadli, Mohammed
Zhao, Shiyi
[J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2020, 30 (08) : 3365 - 3381
[6] Nonlinear Robust Adaptive Control of Flexible Air-Breathing Hypersonic Vehicles
Fiorentini, Lisa
Serrani, Andrea
Bolender, Michael A.
Doman, David B.
[J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2009, 32 (02) : 402 - 417
[7] Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model
Gao, Weinan
Mynuddin, Mohammed
Wunsch, Donald C.
Jiang, Zhong-Ping
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5229 - 5240
[8] García J, 2015, J MACH LEARN RES, V16, P1437
[9] Optimal and Autonomous Control Using Reinforcement Learning: A Survey
Kiumarsi, Bahare
Vamvoudakis, Kyriakos G.
Modares, Hamidreza
Lewis, Frank L.
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2042 - 2062
[10] Lewis F. L., 2012, OPTIMAL CONTROL

← 1 2 3 4 →