Safe Reinforcement Learning for Zero-Sum Games of Hypersonic Flight Vehicles

被引:0
作者
Shi, Lei [1 ,2 ,3 ]
Wang, Xuesong [1 ,2 ,3 ]
Cheng, Yuhu [1 ,2 ,3 ]
机构
[1] China Univ Min & Technol, Engn Res Ctr Intelligent Control Underground Space, Minist Educ, Xuzhou 221116, Peoples R China
[2] China Univ Min & Technol, Xuzhou Key Lab Artificial Intelligence & Big Data, Xuzhou 221116, Peoples R China
[3] China Univ Min & Technol, Sch Informat & Control Engn, Xuzhou 221116, Peoples R China
基金
中国国家自然科学基金;
关键词
Power capacitors; Aerodynamics; Vehicle dynamics; Safety; Mathematical models; Game theory; Reinforcement learning; Games; Elevators; Stability criteria; Safe reinforcement learning; hypersonic flight vehicle; zero-sum game; actor-critic-disturbance; ADAPTIVE-CONTROL; SYSTEMS; MODEL;
D O I
10.1109/TVT.2024.3426326
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
This article presents a safe reinforcement learning algorithm for the zero-sum game (ZSG) problem of hypersonic flight vehicles within the actor-critic-disturbance framework. Initially, a system transformation on the basis of barrier-function is suggested in which an original safe control issue with full-state constraints is transformed into an equivalent unconstrained optimization problem. Then, an actor-critic-disturbance structure for adaptive optimal learning is proposed to solve the ZSG issue online while assuring safety and stability. Furthermore, based on experience replay technique, novel learning rules for network weights are presented, which not only enable the convergence process of network weights to be more stable, but also accelerate the convergence speed. Thereafter, the stability of the closed-loop system and uniform ultimate boundedness of weight estimation errors are demonstrated by the Lyapunov method. Ultimately, a simulation example is executed to verify the efficiency of the suggested safe reinforcement learning algorithm.
引用
收藏
页码:191 / 200
页数:10
相关论文
共 34 条
  • [1] Barrier Lyapunov function-based adaptive control for hypersonic flight vehicles
    An, Hao
    Xia, Hongwei
    Wang, Changhong
    [J]. NONLINEAR DYNAMICS, 2017, 88 (03) : 1833 - 1853
  • [2] Basar T., 1995, Dynamic Noncooperative Game Theory
  • [3] Nonlinear longitudinal dynamical model of an air-breathing hypersonic vehicle
    Bolender, Michael A.
    Doman, David B.
    [J]. JOURNAL OF SPACECRAFT AND ROCKETS, 2007, 44 (02) : 374 - 387
  • [4] Finite-time nonsingular terminal sliding mode control-based fuzzy smooth-switching coordinate strategy for AHV-VGI
    Dou, Liqian
    Du, Miaomiao
    Mao, Qi
    Zong, Qun
    [J]. AEROSPACE SCIENCE AND TECHNOLOGY, 2020, 106
  • [5] Asymptotic tracking control for constrained nonstrict-feedback MIMO nonlinear systems via parameter compensations
    Du, Peihao
    Pan, Yingnan
    Chadli, Mohammed
    Zhao, Shiyi
    [J]. INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2020, 30 (08) : 3365 - 3381
  • [6] Nonlinear Robust Adaptive Control of Flexible Air-Breathing Hypersonic Vehicles
    Fiorentini, Lisa
    Serrani, Andrea
    Bolender, Michael A.
    Doman, David B.
    [J]. JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2009, 32 (02) : 402 - 417
  • [7] Reinforcement Learning-Based Cooperative Optimal Output Regulation via Distributed Adaptive Internal Model
    Gao, Weinan
    Mynuddin, Mohammed
    Wunsch, Donald C.
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (10) : 5229 - 5240
  • [8] García J, 2015, J MACH LEARN RES, V16, P1437
  • [9] Optimal and Autonomous Control Using Reinforcement Learning: A Survey
    Kiumarsi, Bahare
    Vamvoudakis, Kyriakos G.
    Modares, Hamidreza
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (06) : 2042 - 2062
  • [10] Lewis F. L., 2012, OPTIMAL CONTROL