Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning

被引:59
作者
Mu, Chaoxu [1 ]
Zhao, Qian [1 ]
Gao, Zhongke [1 ]
Sun, Changyin [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
来源
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS | 2019年 / 356卷 / 13期
基金
中国国家自然科学基金;
关键词
ZERO-SUM GAMES; STABILITY ANALYSIS; TRACKING CONTROL; GRAPHICAL GAMES; NETWORKS; DELAY;
D O I
10.1016/j.jfranklin.2019.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time mul-tiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton-Jacobi-Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme. (C) 2019 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:6946 / 6967
页数:22
相关论文
共 44 条
  • [1] Discrete-time dynamic graphical games: model-free reinforcement learning solution
    Abouheaf M.I.
    Lewis F.L.
    Mahmoud M.S.
    Mikulski D.G.
    [J]. Control theory technol., 1 (55-69): : 55 - 69
  • [2] Abouheaf M, 2013, P AMER CONTR CONF, P4189
  • [3] Multi-agent discrete-time graphical games and reinforcement learning solutions
    Abouheaf, Mohammed I.
    Lewis, Frank L.
    Vamvoudakis, Kyriakos G.
    Haesaert, Sofie
    Babuska, Robert
    [J]. AUTOMATICA, 2014, 50 (12) : 3038 - 3053
  • [4] Abouheaf MI, 2013, IEEE DECIS CONTR P, P5803, DOI 10.1109/CDC.2013.6760804
  • [5] Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach
    Abu-Khalaf, M
    Lewis, FL
    [J]. AUTOMATICA, 2005, 41 (05) : 779 - 791
  • [6] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control
    Al-Tamimi, Asma
    Lewis, Frank L.
    Abu-Khalaf, Murad
    [J]. AUTOMATICA, 2007, 43 (03) : 473 - 481
  • [7] An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination
    Cao, Yongcan
    Yu, Wenwu
    Ren, Wei
    Chen, Guanrong
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2013, 9 (01) : 427 - 438
  • [8] Fuzzy Observed-Based Adaptive Consensus Tracking Control for Second-Order Multiagent Systems With Heterogeneous Nonlinear Dynamics
    Chen, C. L. Philip
    Ren, Chang-E
    Du, Tao
    [J]. IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2016, 24 (04) : 906 - 915
  • [9] Adaptive Consensus Control for a Class of Nonlinear Multiagent Time-Delay Systems Using Neural Networks
    Chen, C. L. Philip
    Wen, Guo-Xing
    Liu, Yan-Jun
    Wang, Fei-Yue
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2014, 25 (06) : 1217 - 1226
  • [10] Consensus of fractional-order multiagent system via sampled-data event-triggered control
    Chen, Yiwen
    Wen, Guoguang
    Peng, Zhaoxia
    Rahmani, Ahmed
    [J]. JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2019, 356 (17): : 10241 - 10259