Value Approximation for Two-Player General-Sum Differential Games With State Constraints

被引:0
作者
Zhang, Lei [1 ]
Ghimire, Mukesh [1 ]
Zhang, Wenlong [2 ]
Xu, Zhe [1 ]
Ren, Yi [1 ]
机构
[1] Arizona State Univ, Dept Mech & Aerosp Engn, Tempe, AZ 85287 USA
[2] Arizona State Univ, Sch Mfg Syst & Networks, Ira A Fulton Sch Engn, Mesa, AZ 85212 USA
关键词
Safety; Games; Differential games; Robots; Neural networks; Mathematical models; Human-robot interaction; General-sum differential game; physics-informed neural network (PINN); safe human-robot interactions; INFORMED NEURAL-NETWORKS; INFORMATION; FRAMEWORK;
D O I
10.1109/TRO.2024.3411850
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed neural networks (PINNs) have shown promise in alleviating CoD in solving PDEs, vanilla PINNs fall short in learning discontinuous solutions due to their sampling nature, leading to poor safety performance of the resulting policies when values are discontinuous due to state or temporal logic constraints. In this study, we explore three potential solutions to this challenge: 1) a hybrid learning method that is guided by both supervisory equilibria and the HJI PDE, 2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and 3) the epigraphical technique that lifts the value to a higher dimensional state space where it becomes continuous. Evaluations through 5-D and 9-D vehicle and 13-D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance by taking advantage of both the supervisory equilibrium values and co-states, and the low cost of PINN loss gradients.
引用
收藏
页码:4837 / 4855
页数:19
相关论文
共 63 条
  • [1] A GENERAL HAMILTON-JACOBI FRAMEWORK FOR NON-LINEAR STATE-CONSTRAINED CONTROL PROBLEMS
    Altarovici, Albert
    Bokanowski, Olivier
    Zidani, Hasnaa
    [J]. ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2013, 19 (02) : 337 - 357
  • [2] Aumann R. J., 1995, Repeated games with incomplete information
  • [3] DeepReach: A Deep Learning Approach to High-Dimensional Reachability
    Bansal, Somil
    Tomlin, Claire J.
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1817 - 1824
  • [4] Bellman R., 1965, Dynamic Programming and Modern Control Theory
  • [5] Bengio Y., 2009, P 26 ANN INT C MACHI, P41
  • [6] Noncooperative Differential Games
    Bressan, Alberto
    [J]. MILAN JOURNAL OF MATHEMATICS, 2011, 79 (02) : 357 - 427
  • [7] Bui M., 2022, arXiv
  • [8] Differential games with asymmetric information
    Cardaliaguet, P.
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2007, 46 (03) : 816 - 838
  • [9] Games with Incomplete Information in Continuous Time and for Continuous Types
    Cardaliaguet, Pierre
    Rainer, Catherine
    [J]. DYNAMIC GAMES AND APPLICATIONS, 2012, 2 (02) : 206 - 227
  • [10] Numerical Approximation and Optimal Strategies for Differential Games with Lack of Information on One Side
    Cardaliaguet, Pierre
    [J]. ADVANCES IN DYNAMIC GAMES AND THEIR APPLICATIONS, 2009, 10 : 159 - 176