Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System

被引：74

作者：

Johnson, Marcus ^{[1
]}

Kamalapurkar, Rushikesh ^{[1
]}

Bhasin, Shubhendu ^{[2
]}

Dixon, Warren E. ^{[1
]}

机构：

[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL 32611 USA

[2] IIT Delhi, Dept Elect Engn, New Delhi 110016, India

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2015年 / 26卷 / 08期

基金：

美国国家科学基金会;

关键词：

Actor-critic (AC) methods; adaptive control; adaptive dynamic programming; differential games; optimal control; ADAPTIVE CRITIC DESIGNS; H-INFINITY CONTROL; NEURAL-NETWORKS; CONTINUOUS-TIME; CONTROLLER; SOLVE;

D O I：

10.1109/TNNLS.2014.2350835

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An approximate online equilibrium solution is developed for an N-player nonzero-sum game subject to continuous-time nonlinear unknown dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier structure is used, wherein a robust dynamic neural network is used to asymptotically identify the uncertain system with additive disturbances, and a set of critic and actor NNs are used to approximate the value functions and equilibrium policies, respectively. The weight update laws for the actor neural networks (NNs) are generated using a gradient-descent method, and the critic NNs are generated by least square regression, which are both based on the modified Bellman error that is independent of the system dynamics. A Lyapunov-based stability analysis shows that uniformly ultimately bounded tracking is achieved, and a convergence analysis demonstrates that the approximate control policies converge to a neighborhood of the optimal solutions. The actor, critic, and identifier structures are implemented in real time continuously and simultaneously. Simulations on two and three player games illustrate the performance of the developed method.

引用

页码：1645 / 1658

页数：14

共 59 条

[1] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach [J].

Abu-Khalaf, M ;

Lewis, FL .

AUTOMATICA, 2005, 41 (05) :779-791

[2] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04) :943-949

[3] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[4] Adaptive critic designs for discrete-time zero-sum games with application to H∞ control [J].

Al-Tamimi, Asma ;

Abu-Khalaf, Murad ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01) :240-247

[5]

[Anonymous], 1996, Neuro-Dynamic Programming

[6]

[Anonymous], 1998, Reinforcement Learning: An Introduction

[7]

[Anonymous], 2016, Friedman

[8]

[Anonymous], 1988, Mathematics and its Applications (Soviet Series), DOI DOI 10.1007/978-94-015-7793-9

[9]

Baird III L. C., 1993, WLTR931146

[10] Adaptive-critic-based neural networks for aircraft optimal control [J].

Balakrishnan, SN ;

Biega, V .

JOURNAL OF GUIDANCE CONTROL AND DYNAMICS, 1996, 19 (04) :893-898

← 1 2 3 4 5 6 →