Multi-player H∞ Differential Game using On-Policy and Off-Policy Reinforcement Learning

被引：0

作者：

An, Peiliang ^{[1
]}

Liu, Mushuang ^{[1
]}

Wan, Yan ^{[1
]}

Lewis, Frank L. ^{[2
]}

机构：

[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA

[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA

来源：

2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA) | 2020年

基金：

美国国家科学基金会;

关键词：

TRACKING CONTROL; TIME-SYSTEMS; ALGORITHMS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies a multi-player H-infinity differential game for systems of general linear dynamics. In this game, multiple players design their control inputs to minimize their cost functions in the presence of worst-case disturbances. We first derive the optimal control and disturbance policies using the solutions to Hamilton-Jacobi-Isaacs (HJI) equations. We then prove that the derived optimal policies stabilize the system and constitute a Nash equilibrium solution. Two integral reinforcement learning (IRL) -based algorithms, including the policy iteration IRL and off-policy IRL, are developed to solve the differential game online. We show that the off-policy IRL can solve the multi-player H-infinity differential game online without using any system dynamics information. Simulation studies are conducted to validate the theoretical analysis and demonstrate the effectiveness of the developed learning algorithms.

引用

页码：1137 / 1142

页数：6

共 49 条

[41] Cooperative Path Following Control in Autonomous Vehicles Graphical Games: A Data-Based Off-Policy Learning Approach
Xu, Yong
Wu, Zheng-Guang
Pan, Ya-Jun
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 1 - 11
[42] Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
Kanazawa, Takuya
Gupta, Chetan
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 63 - 76
[43] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
Lu, Songtao
Zhang, Kaiqing
Chen, Tianyi
Basar, Tamer
Horesh, Lior
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
[44] Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure
Park, Hyungjun
Choi, Dong Gu
Min, Daiki
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2023, 266
[45] Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes
Li, Jinna
Chai, Tianyou
Lewis, Frank L.
Fan, Jialu
Ding, Zhengtao
Ding, Jinliang
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (05) : 4092 - 4102
[46] A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
Shi, Junru
Wang, Xin
Zhang, Mingchuan
Liu, Muhua
Zhu, Junlong
Wu, Qingtao
COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 7297 - 7310
[47] Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning
Calinon, Sylvain
Kormushev, Petar
Caldwell, Darwin G.
ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (04) : 369 - 379
[48] Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning
Zhao, Jingang
NEUROCOMPUTING, 2020, 412 : 167 - 176
[49] Traffic light control using deep policy-gradient and value-function-based reinforcement learning
Mousavi, Seyed Sajad
Schukat, Michael
Howley, Enda
IET INTELLIGENT TRANSPORT SYSTEMS, 2017, 11 (07) : 417 - 423

← 1 2 3 4 5 →