Value Approximation for Two-Player General-Sum Differential Games With State Constraints

被引：0

作者：

Zhang, Lei ^{[1
]}

Ghimire, Mukesh ^{[1
]}

Zhang, Wenlong ^{[2
]}

Xu, Zhe ^{[1
]}

Ren, Yi ^{[1
]}

机构：

[1] Arizona State Univ, Dept Mech & Aerosp Engn, Tempe, AZ 85287 USA

[2] Arizona State Univ, Sch Mfg Syst & Networks, Ira A Fulton Sch Engn, Mesa, AZ 85212 USA

来源：

IEEE TRANSACTIONS ON ROBOTICS | 2024年 / 40卷

关键词：

Safety; Games; Differential games; Robots; Neural networks; Mathematical models; Human-robot interaction; General-sum differential game; physics-informed neural network (PINN); safe human-robot interactions; INFORMED NEURAL-NETWORKS; INFORMATION; FRAMEWORK;

D O I：

10.1109/TRO.2024.3411850

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Solving Hamilton-Jacobi-Isaacs (HJI) PDEs numerically enables equilibrial feedback control in two-player differential games, yet faces the curse of dimensionality (CoD). While physics-informed neural networks (PINNs) have shown promise in alleviating CoD in solving PDEs, vanilla PINNs fall short in learning discontinuous solutions due to their sampling nature, leading to poor safety performance of the resulting policies when values are discontinuous due to state or temporal logic constraints. In this study, we explore three potential solutions to this challenge: 1) a hybrid learning method that is guided by both supervisory equilibria and the HJI PDE, 2) a value-hardening method where a sequence of HJIs are solved with increasing Lipschitz constant on the constraint violation penalty, and 3) the epigraphical technique that lifts the value to a higher dimensional state space where it becomes continuous. Evaluations through 5-D and 9-D vehicle and 13-D drone simulations reveal that the hybrid method outperforms others in terms of generalization and safety performance by taking advantage of both the supervisory equilibrium values and co-states, and the low cost of PINN loss gradients.

引用

页码：4837 / 4855

页数：19

共 63 条

[1] A GENERAL HAMILTON-JACOBI FRAMEWORK FOR NON-LINEAR STATE-CONSTRAINED CONTROL PROBLEMS
Altarovici, Albert
Bokanowski, Olivier
Zidani, Hasnaa
[J]. ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2013, 19 (02) : 337 - 357
[2] Aumann R. J., 1995, Repeated games with incomplete information
[3] DeepReach: A Deep Learning Approach to High-Dimensional Reachability
Bansal, Somil
Tomlin, Claire J.
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1817 - 1824
[4] Bellman R., 1965, Dynamic Programming and Modern Control Theory
[5] Bengio Y., 2009, P 26 ANN INT C MACHI, P41
[6] Noncooperative Differential Games
Bressan, Alberto
[J]. MILAN JOURNAL OF MATHEMATICS, 2011, 79 (02) : 357 - 427
[7] Bui M., 2022, arXiv
[8] Differential games with asymmetric information
Cardaliaguet, P.
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2007, 46 (03) : 816 - 838
[9] Games with Incomplete Information in Continuous Time and for Continuous Types
Cardaliaguet, Pierre
Rainer, Catherine
[J]. DYNAMIC GAMES AND APPLICATIONS, 2012, 2 (02) : 206 - 227
[10] Numerical Approximation and Optimal Strategies for Differential Games with Lack of Information on One Side
Cardaliaguet, Pierre
[J]. ADVANCES IN DYNAMIC GAMES AND THEIR APPLICATIONS, 2009, 10 : 159 - 176

← 1 2 3 4 5 6 7 →