Game-Theoretic Inverse Reinforcement Learning: A Differential Pontryagin's Maximum Principle Approach

被引:10
作者
Cao, Kun [1 ]
Xie, Lihua [1 ]
机构
[1] Nanyang Technol Univ, Sch Elect & Elect Engn, Singapore 639798, Singapore
关键词
Games; Trajectory; Mathematical models; Costs; Entropy; Cost function; Computer science; Maximum principle; multistage game; reinforcement learning (RL);
D O I
10.1109/TNNLS.2022.3148376
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This brief proposes a game-theoretic inverse reinforcement learning (GT-IRL) framework, which aims to learn the parameters in both the dynamic system and individual cost function of multistage games from demonstrated trajectories. Different from the probabilistic approaches in computer science community and residual minimization solutions in control community, our framework addresses the problem in a deterministic setting by differentiating Pontryagin's maximum principle (PMP) equations of open-loop Nash equilibrium (OLNE), which is inspired by Jin et al. (2020). The differentiated equations for a multi-player nonzero-sum multistage game are shown to be equivalent to the PMP equations for another affine-quadratic nonzero-sum multistage game and can be solved by some explicit recursions. A similar result is established for two-player zero-sum games. Simulation examples are presented to demonstrate the effectiveness of our proposed algorithms.
引用
收藏
页码:9506 / 9513
页数:8
相关论文
共 41 条
  • [1] From inverse optimal control to inverse reinforcement learning: A historical review
    Ab Azar, Nematollah
    Shahmansoorian, Aref
    Davoudi, Mohsen
    [J]. ANNUAL REVIEWS IN CONTROL, 2020, 50 : 119 - 138
  • [2] Abbeel P., 2004, P 21 INT C MACH LEAR, P1, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
  • [3] Amodei D, 2016, ARXIV
  • [4] CasADi: a software framework for nonlinear optimization and optimal control
    Andersson, Joel A. E.
    Gillis, Joris
    Horn, Greg
    Rawlings, James B.
    Diehl, Moritz
    [J]. MATHEMATICAL PROGRAMMING COMPUTATION, 2019, 11 (01) : 1 - 36
  • [5] Basar, 2018, HDB DYNAMIC GAME THE
  • [6] Bertsekas D. P., 2005, DYNAMIC PROGRAMMING, V2
  • [7] Di, 2019, ARXIV190609097
  • [8] Inverse KKT: Learning cost functions of manipulation tasks from demonstrations
    Englert, Peter
    Ngo Anh Vien
    Toussaint, Marc
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2017, 36 (13-14) : 1474 - 1488
  • [9] Freeman R., 1996, ROBUST NONLINEAR CON, DOI [10.1007/978-0-8176-4759-9#about, DOI 10.1007/978-0-8176-4759-9#ABOUT]
  • [10] The greedy crowd and smart leaders: a hierarchical strategy selection game with learning protocol
    Guo, Linghui
    Liu, Zhongxin
    Chen, Zengqiang
    [J]. SCIENCE CHINA-INFORMATION SCIENCES, 2021, 64 (03)