Optimal game theoretic solution of the pursuit-evasion intercept problem using on-policy reinforcement learning

被引：27

作者：

Kartal, Yusuf ^{[1
,2
]}

Subbarao, Kamesh ^{[2
]}

Dogan, Atilla ^{[2
]}

Lewis, Frank ^{[1
,3
]}

机构：

[1] Univ Texas Arlington Res Inst, Automat & Intelligent Syst Div, Ft Worth, TX 76118 USA

[2] Univ Texas Arlington, Mech & Aerosp Engn, Ft Worth, TX USA

[3] Univ Texas Arlington, Elect Engn, Arlington, TX 76019 USA

来源：

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL | 2021年 / 31卷 / 16期

基金：

美国国家科学基金会;

关键词：

nonlinear backstepping control; online reinforcement learning; optimal constrained control; pursuit-evasion game; three-dimensional nonlinear systems; ZERO-SUM GAMES; GUIDANCE LAWS; SYSTEMS;

D O I：

10.1002/rnc.5719

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article presents a rigorous formulation for the pursuit-evasion (PE) game when velocity constraints are imposed on agents of the game or players. The game is formulated as an infinite-horizon problem using a non-quadratic functional, then sufficient conditions are derived to prove capture in a finite-time. A novel tracking Hamilton-Jacobi-Isaacs (HJI) equation associated with the non-quadratic value function is employed, which is solved for Nash equilibrium velocity policies for each agent with arbitrary nonlinear dynamics. In contrast to the existing remedies for proof of capture in PE game, the proposed method does not assume players are moving with their maximum velocities and considers the velocity constraints a priori. Attaining the optimal actions requires the solution of HJI equations online and in real-time. We overcome this problem by presenting the on-policy iteration of integral reinforcement learning (IRL) technique. The persistence of excitation for IRL to work is satisfied inherently until capture occurs, at which time the game ends. Furthermore, a nonlinear backstepping control method is proposed to track desired optimal velocity trajectories for players with generalized Newtonian dynamics. Simulation results are provided to show the validity of the proposed methods.

引用

页码：7886 / 7903

页数：18

共 32 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[3]

Basar T., 2008, H-infinity optimal control and related minimax design problems: a dynamic game approach

[4] A Visibility-Based Pursuit-Evasion Game with a Circular Obstacle [J].

Bhattacharya, Sourabh ;

Basar, Tamer ;

Hovakimyan, Naira .

JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2016, 171 (03) :1071-1082

[5]

Bryson A. E., 1975, Applied Optimal Control: Optimization, Estimation, and Control

[6]

Cannarsa Piermarco, 2004, PROG NONLIN, V58

[7] Solution of a Pursuit-Evasion Game Using a Near-Optimal Strategy [J].

Carr, Ryan W. ;

Cobb, Richard G. ;

Pachter, Meir ;

Pierce, Scott .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 2018, 41 (04) :841-850

[8]

Dong Y., 2020, AEROSP SCI TECHNOL, V99

[9]

Gong H., 2020, IEEE T AEROSP ELECT

[10]

Haddad W. M., 2011, Nonlinear Dynamical Systems and Control: A Lyapunov-Based Approach

← 1 2 3 4 →