Data-Driven Inverse Reinforcement Learning Control for Linear Multiplayer Games

被引:30
作者
Lian, Bosen [1 ]
Donge, Vrushabh S. [1 ]
Lewis, Frank L. [1 ]
Chai, Tianyou [2 ,3 ]
Davoudi, Ali [1 ]
机构
[1] Univ Texas Arlington, Dept Elect Engn, Arlington, TX 76019 USA
[2] Northeastern Univ, State Key Lab Synthet Automat Proc Ind, Shenyang 110819, Peoples R China
[3] Northeastern Univ, Int Joint Res Lab Integrated Automat, Shenyang 110819, Peoples R China
关键词
Games; Cost function; Optimal control; Heuristic algorithms; Trajectory; System dynamics; Costs; Inverse optimal control (IOC); inverse RL; nonzero-sum Nash games; off-policy; optimal control; CONTINUOUS-TIME; IDENTIFICATION;
D O I
10.1109/TNNLS.2022.3186229
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This article proposes a data-driven inverse reinforcement learning (RL) control algorithm for nonzero-sum multiplayer games in linear continuous-time differential dynamical systems. The inverse RL problem in the games is solved by a learner reconstructing the unknown expert players' cost functions from demonstrated expert's optimal state and control input trajectories. The learner, thus, obtains the same control feedback gains and trajectories as the expert, only using data along system trajectories without knowing system dynamics. This article first proposes a model-based inverse RL policy iteration framework that has: 1) policy evaluation step for reconstructing cost matrices using Lyapunov functions; 2) state-reward weight improvement step using inverse optimal control (IOC); and 3) policy improvement step using optimal control. Based on the model-based policy iteration algorithm, this article further develops an online data-driven off-policy inverse RL algorithm without knowing any knowledge of system dynamics or expert control gains. Rigorous convergence and stability analysis of the algorithms are provided. It shows that the off-policy inverse RL algorithm guarantees unbiased solutions while probing noises are added to satisfy the persistence of excitation (PE) condition. Finally, two different simulation examples validate the effectiveness of the proposed algorithms.
引用
收藏
页码:2028 / 2041
页数:14
相关论文
共 49 条
  • [1] From inverse optimal control to inverse reinforcement learning: A historical review
    Ab Azar, Nematollah
    Shahmansoorian, Aref
    Davoudi, Mohsen
    [J]. ANNUAL REVIEWS IN CONTROL, 2020, 50 : 119 - 138
  • [2] Abbeel P., 2004, TWENTYFIRST INT C MA, DOI DOI 10.1145/1015330.1015430
  • [3] [Anonymous], 1995, Algebraic Riccati Equations
  • [4] A survey of inverse reinforcement learning: Challenges, methods and progress
    Arora, Saurabh
    Doshi, Prashant
    [J]. ARTIFICIAL INTELLIGENCE, 2021, 297 (297)
  • [5] Reinforcement Learning and Adaptive Optimal Control for Continuous-Time Nonlinear Systems: A Value Iteration Approach
    Bian, Tao
    Jiang, Zhong-Ping
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (07) : 2781 - 2790
  • [6] Bittanti S., 1991, RICCATI EQUATION
  • [7] Deep Inverse Reinforcement Learning for Objective Function Identification in Bidding Models
    Guo, Hongye
    Chen, Qixin
    Xia, Qing
    Kang, Chongqing
    [J]. IEEE TRANSACTIONS ON POWER SYSTEMS, 2021, 36 (06) : 5684 - 5696
  • [8] Guojun Wu, 2017, 2017 IEEE 56th Annual Conference on Decision and Control (CDC), P5068, DOI 10.1109/CDC.2017.8264410
  • [9] Haddad W. M., 2011, Nonlinear dynamical systems and control: A Lyapunov-based approach
  • [10] Scalable Inverse Reinforcement Learning Through Multifidelity Bayesian Optimization
    Imani, Mahdi
    Ghoreishi, Seyede Fatemeh
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 4125 - 4132