Neural-network-based learning algorithms for cooperative games of discrete-time multi-player systems with control constraints via adaptive dynamic programming

被引:30
作者
Jiang, He [1 ]
Zhang, Huaguang [1 ]
Xie, Xiangpeng [2 ]
Han, Ji [1 ]
机构
[1] Northeastern Univ, Coll Informat Sci & Engn, Shenyang 110819, Liaoning, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Inst Adv Technol, Nanjing 210003, Peoples R China
基金
中国国家自然科学基金;
关键词
Adaptive dynamic programming; Approximate dynamic programming; Reinforcement learning; Neural networks; ZERO-SUM GAMES; OPTIMAL TRACKING CONTROL; NONLINEAR-SYSTEMS; POLICY-ITERATION;
D O I
10.1016/j.neucom.2018.02.107
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adaptive dynamic programming (ADP), an important branch of reinforcement learning, is a powerful tool in solving various optimal control problems. However, the cooperative game issues of discrete-time multiplayer systems with control constraints have rarely been investigated in this field. In order to address this issue, a novel policy iteration (PI) algorithm is proposed based on ADP technique, and its associated convergence analysis is also studied in this brief paper. For the proposed PI algorithm, an online neural network (NN) implementation scheme with multiple-network structure is presented. In the online NN-based learning algorithm, critic network, constrained actor networks and unconstrained actor networks are employed to approximate the value function, constrained and unconstrained control policies, respectively, and the NN weight updating laws are designed based on the gradient descent method. Finally, a numerical simulation example is illustrated to show the effectiveness. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:13 / 19
页数:7
相关论文
共 34 条
[1]  
[Anonymous], IEEE T CYBERNETICS
[2]   A three-network architecture for on-line learning and optimization based on adaptive dynamic programming [J].
He, Haibo ;
Ni, Zhen ;
Fu, Jian .
NEUROCOMPUTING, 2012, 78 (01) :3-13
[3]   Concurrent learning-based approximate feedback-Nash equilibrium solution of N-player nonzero-sum differential games [J].
Kamalapurkar, Rushikesh ;
Klotz, Justin R. ;
Dixon, Warren E. .
IEEE/CAA Journal of Automatica Sinica, 2014, 1 (03) :239-247
[4]   Actor-Critic-Based Optimal Tracking for Partially Unknown Nonlinear Discrete-Time Systems [J].
Kiumarsi, Bahare ;
Lewis, Frank L. .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) :140-151
[5]   Reinforcement Learning and Adaptive Dynamic Programming for Feedback Control [J].
Lewis, Frank L. ;
Vrabie, Draguna .
IEEE CIRCUITS AND SYSTEMS MAGAZINE, 2009, 9 (03) :32-50
[6]   Reinforcement-Learning-Based Robust Controller Design for Continuous-Time Uncertain Nonlinear Systems Subject to Input Constraints [J].
Liu, Derong ;
Yang, Xiong ;
Wang, Ding ;
Wei, Qinglai .
IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (07) :1372-1385
[7]   Online Synchronous Approximate Optimal Learning Algorithm for Multiplayer Nonzero-Sum Games With Unknown Dynamics [J].
Liu, Derong ;
Li, Hongliang ;
Wang, Ding .
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2014, 44 (08) :1015-1027
[8]   Policy Iteration Adaptive Dynamic Programming Algorithm for Discrete-Time Nonlinear Systems [J].
Liu, Derong ;
Wei, Qinglai .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (03) :621-634
[9]   Neural-Network-Based Optimal Control for a Class of Unknown Discrete-Time Nonlinear Systems Using Globalized Dual Heuristic Programming [J].
Liu, Derong ;
Wang, Ding ;
Zhao, Dongbin ;
Wei, Qinglai ;
Jin, Ning .
IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING, 2012, 9 (03) :628-634
[10]   Reinforcement learning solution for HJB equation arising in constrained optimal control problem [J].
Luo, Biao ;
Wu, Huai-Ning ;
Huang, Tingwen ;
Liu, Derong .
NEURAL NETWORKS, 2015, 71 :150-158