Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System

被引：74

作者：

Johnson, Marcus ^{[1
]}

Kamalapurkar, Rushikesh ^{[1
]}

Bhasin, Shubhendu ^{[2
]}

Dixon, Warren E. ^{[1
]}

机构：

[1] Univ Florida, Dept Mech & Aerosp Engn, Gainesville, FL 32611 USA

[2] IIT Delhi, Dept Elect Engn, New Delhi 110016, India

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2015年 / 26卷 / 08期

基金：

美国国家科学基金会;

关键词：

Actor-critic (AC) methods; adaptive control; adaptive dynamic programming; differential games; optimal control; ADAPTIVE CRITIC DESIGNS; H-INFINITY CONTROL; NEURAL-NETWORKS; CONTINUOUS-TIME; CONTROLLER; SOLVE;

D O I：

10.1109/TNNLS.2014.2350835

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

An approximate online equilibrium solution is developed for an N-player nonzero-sum game subject to continuous-time nonlinear unknown dynamics and an infinite horizon quadratic cost. A novel actor-critic-identifier structure is used, wherein a robust dynamic neural network is used to asymptotically identify the uncertain system with additive disturbances, and a set of critic and actor NNs are used to approximate the value functions and equilibrium policies, respectively. The weight update laws for the actor neural networks (NNs) are generated using a gradient-descent method, and the critic NNs are generated by least square regression, which are both based on the modified Bellman error that is independent of the system dynamics. A Lyapunov-based stability analysis shows that uniformly ultimately bounded tracking is achieved, and a convergence analysis demonstrates that the approximate control policies converge to a neighborhood of the optimal solutions. The actor, critic, and identifier structures are implemented in real time continuously and simultaneously. Simulations on two and three player games illustrate the performance of the developed method.

引用

页码：1645 / 1658

页数：14

共 59 条

[11] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[12]

Basar T.., 2008, H OPTIMAL CONTROL RE

[13]

Basar T., 1995, H> Optimal Control and Related Minimax Design Problems

[14]

Basar T., 1999, SOC IND APPL MATH, V2nd

[15] Galerkin approximations of the generalized Hamilton-Jacobi-Bellman equation [J].

Beard, RW ;

Saridis, GN ;

Wen, JT .

AUTOMATICA, 1997, 33 (12) :2159-2177

[16] A novel actor-critic-identifier architecture for approximate optimal control of uncertain nonlinear systems [J].

Bhasin, S. ;

Kamalapurkar, R. ;

Johnson, M. ;

Vamvoudakis, K. G. ;

Lewis, F. L. ;

Dixon, W. E. .

AUTOMATICA, 2013, 49 (01) :82-92

[17] Asymptotic tracking by a reinforcement learning-based adaptive critic controller [J].

Bhasin S. ;

Sharma N. ;

Patre P. ;

Dixon W. .

Journal of Control Theory and Applications, 2011, 9 (3) :400-409

[18]

Bhasin S., 2011, THESIS U FLORIDA GAI

[19]

Campos J., 1999, Proceedings of the 1999 American Control Conference (Cat. No. 99CH36251), P2813, DOI 10.1109/ACC.1999.786585

[20] TOWARD A THEORY OF MANY PLAYER DIFFERENTIAL GAMES [J].

CASE, JH .

SIAM JOURNAL ON CONTROL, 1969, 7 (02) :179-&

← 1 2 3 4 5 6 →