Q-learning solution for optimal consensus control of discrete-time multiagent systems using reinforcement learning

被引:59
|
作者
Mu, Chaoxu [1 ]
Zhao, Qian [1 ]
Gao, Zhongke [1 ]
Sun, Changyin [2 ]
机构
[1] Tianjin Univ, Sch Elect & Informat Engn, Tianjin 300072, Peoples R China
[2] Southeast Univ, Sch Automat, Nanjing 210096, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
ZERO-SUM GAMES; STABILITY ANALYSIS; TRACKING CONTROL; GRAPHICAL GAMES; NETWORKS; DELAY;
D O I
10.1016/j.jfranklin.2019.06.007
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper investigates a Q-learning scheme for the optimal consensus control of discrete-time mul-tiagent systems. The Q-learning algorithm is conducted by reinforcement learning (RL) using system data instead of system dynamics information. In the multiagent systems, the agents are interacted with each other and at least one agent can communicate with the leader directly, which is described by an algebraic graph structure. The objective is to make all the agents achieve synchronization with leader and make the performance indices reach Nash equilibrium. On one hand, the solutions of the optimal consensus control for multiagent systems are acquired by solving the coupled Hamilton-Jacobi-Bellman (HJB) equation. However, it is difficult to get analytical solutions directly of the discrete-time HJB equation. On the other hand, accurate mathematical models of most systems in real world are hard to be obtained. To overcome these difficulties, Q-learning algorithm is developed using system data rather than the accurate system model. We formulate performance index and corresponding Bellman equation of each agent i. Then, the Q-function Bellman equation is acquired on the basis of Q-function. Policy iteration is adopted to calculate the optimal control iteratively, and least square (LS) method is employed to motivate the implementation process. Stability analysis of proposed Q-learning algorithm for multiagent systems by policy iteration is given. Two simulation examples are experimented to verify the effectiveness of the proposed scheme. (C) 2019 The Franklin Institute. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页码:6946 / 6967
页数:22
相关论文
共 50 条
  • [1] Optimal Consensus Control for Discrete-Time Systems with State Delay Using Q-learning Solution
    Zhang, Li
    Huo, Shicheng
    Zhang, Ya
    2022 IEEE 17TH INTERNATIONAL CONFERENCE ON CONTROL & AUTOMATION, ICCA, 2022, : 630 - 635
  • [2] Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
    Kiumarsi, Bahare
    Lewis, Frank L.
    Modares, Hamidreza
    Karimpour, Ali
    Naghibi-Sistani, Mohammad-Bagher
    AUTOMATICA, 2014, 50 (04) : 1167 - 1175
  • [3] GrHDP Solution for Optimal Consensus Control of Multiagent Discrete-Time Systems
    Zhong, Xiangnan
    He, Haibo
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (07): : 2362 - 2374
  • [4] General Second-Order Consensus of Discrete-Time Multiagent Systems via Q-Learning Method
    Liu, Yifan
    Su, Housheng
    IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2022, 52 (03) : 1417 - 1425
  • [5] General Second-Order Consensus of Discrete-Time Multiagent Systems via Q-Learning Method
    Liu, Yifan
    Su, Housheng
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (03): : 1417 - 1425
  • [6] Model-free optimal tracking control for discrete-time system with delays using reinforcement Q-learning
    Liu, Yang
    Yu, Rui
    ELECTRONICS LETTERS, 2018, 54 (12) : 750 - 751
  • [7] Discrete-Time Optimal Control Scheme Based on Q-Learning Algorithm
    Wei, Qinglai
    Liu, Derong
    Song, Ruizhuo
    2016 SEVENTH INTERNATIONAL CONFERENCE ON INTELLIGENT CONTROL AND INFORMATION PROCESSING (ICICIP), 2016, : 125 - 130
  • [8] Neighbor Q-learning based consensus control for discrete-time multi-agent systems
    Zhu, Xiaoxia
    Yuan, Xin
    Dong, Lu
    Wang, Yuanda
    Sun, Changyin
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2023, 44 (03): : 1475 - 1490
  • [9] Reinforcement Q-learning algorithm for H∞ tracking control of discrete-time Markov jump systems
    Shi, Jiahui
    He, Dakuo
    Zhang, Qiang
    INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2025, 56 (03) : 502 - 523
  • [10] Reinforcement Q-Learning Algorithm for H∞ Tracking Control of Unknown Discrete-Time Linear Systems
    Peng, Yunjian
    Chen, Qian
    Sun, Weijie
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (11): : 4109 - 4122