Multi-player H∞ Differential Game using On-Policy and Off-Policy Reinforcement Learning

被引:0
|
作者
An, Peiliang [1 ]
Liu, Mushuang [1 ]
Wan, Yan [1 ]
Lewis, Frank L. [2 ]
机构
[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA
[2] Univ Texas Arlington, UTA Res Inst, Ft Worth, TX USA
来源
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION (ICCA) | 2020年
基金
美国国家科学基金会;
关键词
TRACKING CONTROL; TIME-SYSTEMS; ALGORITHMS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies a multi-player H-infinity differential game for systems of general linear dynamics. In this game, multiple players design their control inputs to minimize their cost functions in the presence of worst-case disturbances. We first derive the optimal control and disturbance policies using the solutions to Hamilton-Jacobi-Isaacs (HJI) equations. We then prove that the derived optimal policies stabilize the system and constitute a Nash equilibrium solution. Two integral reinforcement learning (IRL) -based algorithms, including the policy iteration IRL and off-policy IRL, are developed to solve the differential game online. We show that the off-policy IRL can solve the multi-player H-infinity differential game online without using any system dynamics information. Simulation studies are conducted to validate the theoretical analysis and demonstrate the effectiveness of the developed learning algorithms.
引用
收藏
页码:1137 / 1142
页数:6
相关论文
共 49 条
  • [41] Cooperative Path Following Control in Autonomous Vehicles Graphical Games: A Data-Based Off-Policy Learning Approach
    Xu, Yong
    Wu, Zheng-Guang
    Pan, Ya-Jun
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2024, : 1 - 11
  • [42] Latent-Conditioned Policy Gradient for Multi-Objective Deep Reinforcement Learning
    Kanazawa, Takuya
    Gupta, Chetan
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT VI, 2023, 14259 : 63 - 76
  • [43] Decentralized Policy Gradient Descent Ascent for Safe Multi-Agent Reinforcement Learning
    Lu, Songtao
    Zhang, Kaiqing
    Chen, Tianyi
    Basar, Tamer
    Horesh, Lior
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8767 - 8775
  • [44] Adaptive inventory replenishment using structured reinforcement learning by exploiting a policy structure
    Park, Hyungjun
    Choi, Dong Gu
    Min, Daiki
    INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2023, 266
  • [45] Off-Policy Q-Learning: Set-Point Design for Optimizing Dual-Rate Rougher Flotation Operational Processes
    Li, Jinna
    Chai, Tianyou
    Lewis, Frank L.
    Fan, Jialu
    Ding, Zhengtao
    Ding, Jinliang
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2018, 65 (05) : 4092 - 4102
  • [46] A distributed adaptive policy gradient method based on momentum for multi-agent reinforcement learning
    Shi, Junru
    Wang, Xin
    Zhang, Mingchuan
    Liu, Muhua
    Zhu, Junlong
    Wu, Qingtao
    COMPLEX & INTELLIGENT SYSTEMS, 2024, 10 (05) : 7297 - 7310
  • [47] Compliant skills acquisition and multi-optima policy search with EM-based reinforcement learning
    Calinon, Sylvain
    Kormushev, Petar
    Caldwell, Darwin G.
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2013, 61 (04) : 369 - 379
  • [48] Neural networks-based optimal tracking control for nonzero-sum games of multi-player continuous-time nonlinear systems via reinforcement learning
    Zhao, Jingang
    NEUROCOMPUTING, 2020, 412 : 167 - 176
  • [49] Traffic light control using deep policy-gradient and value-function-based reinforcement learning
    Mousavi, Seyed Sajad
    Schukat, Michael
    Howley, Enda
    IET INTELLIGENT TRANSPORT SYSTEMS, 2017, 11 (07) : 417 - 423