Iterative ADP learning algorithms for discrete-time multi-player games

被引:56
作者
Jiang, He [1 ]
Zhang, Huaguang [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; Approximate dynamic programming; Reinforcement learning; Neural network; ZERO-SUM GAMES; UNCERTAIN NONLINEAR-SYSTEMS; H-INFINITY CONTROL; CONSTRAINED-INPUT; POLICY ITERATION; EQUATION; DESIGNS;
D O I
10.1007/s10462-017-9603-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adaptive dynamic programming (ADP) is an important branch of reinforcement learning to solve various optimal control issues. Most practical nonlinear systems are controlled by more than one controller. Each controller is a player, and to make a tradeoff between cooperation and conflict of these players can be viewed as a game. Multi-player games are divided into two main categories: zero-sum game and non-zero-sum game. To obtain the optimal control policy for each player, one needs to solve Hamilton-Jacobi-Isaacs equations for zero-sum games and a set of coupled Hamilton-Jacobi equations for non-zero-sum games. Unfortunately, these equations are generally difficult or even impossible to be solved analytically. To overcome this bottleneck, two ADP methods, including a modified gradient-descent-based online algorithm and a novel iterative offline learning approach, are proposed in this paper. Furthermore, to implement the proposed methods, we employ single-network structure, which obviously reduces computation burden compared with traditional multiple-network architecture. Simulation results demonstrate the effectiveness of our schemes.
引用
收藏
页码:75 / 91
页数:17
相关论文
共 49 条
  • [1] Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 943 - 949
  • [2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
  • [3] Adaptive critic designs for discrete-time zero-sum games with application to H∞ control
    Al-Tamimi, Asma
    Abu-Khalaf, Murad
    Lewis, Frank L.
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (01): : 240 - 247
  • [4] [Anonymous], IEEE T SYST MAN CYBE
  • [5] [Anonymous], 2017, IEEE T IND ELECT
  • [6] [Anonymous], IEEE T NEURAL NETW L
  • [7] [Anonymous], 2016, J NANOMATER, DOI [DOI 10.1016/J.FSIGEN.2016.01.005, DOI 10.1155/2016/2358276]
  • [8] Data-based approximate optimal control for nonzero-sum games of multi-player systems using adaptive dynamic programming
    Jiang, He
    Zhang, Huaguang
    Xiao, Geyang
    Cui, Xiaohong
    [J]. NEUROCOMPUTING, 2018, 275 : 192 - 199
  • [9] H∞ control with constrained input for completely unknown nonlinear systems using data-driven reinforcement learning method
    Jiang, He
    Zhang, Huaguang
    Luo, Yanhong
    Cui, Xiaohong
    [J]. NEUROCOMPUTING, 2017, 237 : 226 - 234
  • [10] Approximate N-Player Nonzero-Sum Game Solution for an Uncertain Continuous Nonlinear System
    Johnson, Marcus
    Kamalapurkar, Rushikesh
    Bhasin, Shubhendu
    Dixon, Warren E.
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (08) : 1645 - 1658