Robust hierarchical games of linear discrete-time systems based on off-policy model-free reinforcement learning

被引：2

作者：

Ma, Xiao ^{[1
]}

Yuan, Yuan ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Astronaut, Xian 710072, Peoples R China

来源：

JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS | 2024年 / 361卷 / 07期

关键词：

Robust hierarchical game; Reinforcement learning; Model-free; Off-policy;

D O I：

10.1016/j.jfranklin.2024.106711

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

An off -policy model -free reinforcement learning (RL) algorithm is proposed for a robust hierarchical game while considering incomplete information and input constraints. The robust hierarchical game exhibits characteristics of a Stackelberg-Nash (SN) game, where equilibrium points are designated as Stackelberg-Nash-Saddle equilibrium (SNE) points. An off -policy method is employed for the RL algorithm, addressing input constraints by using excitation input instead of real-time update polices as control inputs. Moreover, a model -free method is implemented for the off -policy RL algorithm, accounting for the challenge posed by incomplete information. The goal of this paper is to develop an off -policy model -free RL algorithm to obtain approximate SNE polices of the robust hierarchical game with incomplete information and input constraints. Furthermore, the convergence and effectiveness of the off -policy model -free RL algorithm are guaranteed by proving the equivalence of Bellman equation between nominal SNE policies and approximate SNE policies. Finally, a simulation is provided to verify the advantage of the developed algorithm.

引用

页数：16

共 26 条

[11] Hierarchical optimal control for input-affine nonlinear systems through the formulation of Stackelberg game [J].

Mu, Chaoxu ;

Wang, Ke ;

Zhang, Qichao ;

Zhao, Dongbin .

INFORMATION SCIENCES, 2020, 517 :1-17

[12] Infinite horizon linear-quadratic Stackelberg games for discrete-time stochastic systems [J].

Mukaidani, Hiroaki ;

Xu, Hua .

AUTOMATICA, 2017, 76 :301-308

[13]

Satouri MR, 2019, Arxiv, DOI arXiv:1907.11414

[14] Cyber Security Framework for Vehicular Network Based on a Hierarchical Game [J].

Sedjelmaci, Hichem ;

Brahmi, Imane Horiya ;

Ansari, Nirwan ;

Rehmani, Mubashir Husain .

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTING, 2021, 9 (01) :429-440

[15] Actor-Critic Off-Policy Learning for Optimal Control of Multiple-Model Discrete-Time Systems [J].

Skach, Jan ;

Kiumarsi, Bahare ;

Lewis, Frank L. ;

Straka, Ondrej .

IEEE TRANSACTIONS ON CYBERNETICS, 2018, 48 (01) :29-40

[16] Off-Policy Integral Reinforcement Learning Method to Solve Nonlinear Continuous-Time Multiplayer Nonzero-Sum Games [J].

Song, Ruizhuo ;

Lewis, Frank L. ;

Wei, Qinglai .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2017, 28 (03) :704-713

[17] A Hierarchical Game Theoretic Framework for Cognitive Radio Networks [J].

Xiao, Yong ;

Bi, Guoan ;

Niyato, Dusit ;

DaSilva, Luiz A. .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2012, 30 (10) :2053-2069

[18] Cooperative Finitely Excited Learning for Dynamical Games [J].

Yang, Yongliang ;

Modares, Hamidreza ;

Vamvoudakis, Kyriakos G. ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON CYBERNETICS, 2024, 54 (02) :797-810

[19] Hamiltonian-Driven Adaptive Dynamic Programming With Efficient Experience Replay [J].

Yang, Yongliang ;

Pan, Yongping ;

Xu, Cheng-Zhong ;

Wunsch, Donald C. .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) :3278-3290

[20] Model-Free λ-Policy Iteration for Discrete-Time Linear Quadratic Regulation [J].

Yang, Yongliang ;

Kiumarsi, Bahare ;

Modares, Hamidreza ;

Xu, Chengzhong .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (02) :635-649

← 1 2 3 →