Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

被引:2
作者
Ma, Xiao [1 ]
Yuan, Yuan [1 ]
机构
[1] Northwestern Polytech Univ, Sch Astronaut, Xian 710072, Peoples R China
来源
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS | 2024年 / 361卷 / 07期
关键词
Robust hierarchical game; Reinforcement learning; Model-free; Off-policy;
D O I
10.1016/j.jfranklin.2024.106711
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
An off -policy model -free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg-Nash (SN) game, where equilibrium points are designated as Stackelberg-Nash-Saddle equilibrium (SNE) points. An off -policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model -free method is implemented for the off -policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off -policy model -free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off -policy model -free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.
引用
收藏
页数:16
相关论文
共 26 条
[11]   Hierarchical optimal control for input-affine nonlinear systems through the formulation of Stackelberg game [J].
Mu, Chaoxu ;
Wang, Ke ;
Zhang, Qichao ;
Zhao, Dongbin .
INFORMATION SCIENCES, 2020, 517 :1-17
[12]   Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems [J].
Mukaidani, Hiroaki ;
Xu, Hua .
AUTOMATICA, 2017, 76 :301-308
[13]  
Satouri MR, 2019, Arxiv, DOI arXiv:1907.11414
[14]   Cyber Security Framework for Vehicular Network Based on a Hierarchical Game [J].
Sedjelmaci, Hichem ;
Brahmi, Imane Horiya ;
Ansari, Nirwan ;
Rehmani, Mubashir Husain .
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (01) :429-440
[15]   Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems [J].
Skach, Jan ;
Kiumarsi, Bahare ;
Lewis, Frank L. ;
Straka, Ondrej .
IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) :29-40
[16]   Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games [J].
Song, Ruizhuo ;
Lewis, Frank L. ;
Wei, Qinglai .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :704-713
[17]   A Hierarchical Game Theoretic Framework for Cognitive Radio Networks [J].
Xiao, Yong ;
Bi, Guoan ;
Niyato, Dusit ;
DaSilva, Luiz A. .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2012, 30 (10) :2053-2069
[18]   Cooperative Finitely Excited Learning for Dynamical Games [J].
Yang, Yongliang ;
Modares, Hamidreza ;
Vamvoudakis, Kyriakos G. ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) :797-810
[19]   Hamiltonian-Driven Adaptive Dynamic Programming With Efficient Experience Replay [J].
Yang, Yongliang ;
Pan, Yongping ;
Xu, Cheng-Zhong ;
Wunsch, Donald C. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) :3278-3290
[20]   Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation [J].
Yang, Yongliang ;
Kiumarsi, Bahare ;
Modares, Hamidreza ;
Xu, Chengzhong .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) :635-649