Data-Based Optimal Consensus Control for Multiagent Systems With Time Delays: Using Prioritized Experience Replay

被引：10

作者：

Ji, Lianghao ^{[1
,2
]}

Lin, Zhiqiang ^{[1
,2
]}

Zhang, Cuijuan ^{[1
,2
]}

Yang, Shasha ^{[1
,2
]}

Li, Jun ^{[3
]}

Li, Huaqing ^{[3
]}

机构：

[1] Chongqing Univ Posts & Telecommun, Chongqing Key Lab Image Cognit, Chongqing 400065, Peoples R China

[2] Chongqing Univ Posts & Telecommun, Sch Comp Sci & Technol, Chongqing 400065, Peoples R China

[3] Southwest Univ, Coll Elect & Informat Engn, Chongqing Key Lab Nonlinear Circuits & Intelligen, Chongqing 400715, Peoples R China

来源：

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS | 2024年 / 54卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Multi-agent systems; Multiagent systems (MASs); optimal control; prioritized experience replay (PER); reinforcement learning (RL); time delay; ECONOMIC-DISPATCH; OPTIMIZATION;

D O I：

10.1109/TSMC.2024.3358293

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This article is centered on the optimal consensus problem of the multiagent systems (MASs) with time delays. By designing a new augmented state, the delayed MASs are reformulated as a delay-free system, and each agent is to minimize its local cost that may depend on the decisions of the other agents, which is regarded as a Nash equilibrium problem. To this end, we propose a multiagent deterministic policy gradient (MADPG) method based on actor-critic (AC) networks to minimize the local cost (Q-function) by introducing the policy gradient technique, and its convergence and optimality are proven as well. In particular, we develop an optimized prioritized experience replay (PER) strategy that allows high-value samples to be selected with a higher probability, which enhance networks' data utilization. Finally, the effectiveness of the algorithm and the advantages of PER are demonstrated with a simulated example and a comparative simulation.

引用

页码：3244 / 3256

页数：13

共 50 条

[1] Distributed optimal consensus control for multiagent systems based on event-triggered and prioritized experience replay strategies
Zhang, Cuijuan
Ji, Lianghao
Yang, Shasha
Guo, Xing
Li, Huaqing
SCIENCE CHINA-INFORMATION SCIENCES, 2025, 68 (01)
[2] Data-Based Optimal Consensus Control for Multiagent Systems With Policy Gradient Reinforcement Learning
Yang, Xindi
Zhang, Hao
Wang, Zhuping
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3872 - 3883
[3] Nearly Optimal Consensus Control of Discrete Time Multiagent Systems with Time Delays
Zhang, Yao
Mu, Chaoxu
Zhao, Qian
Wang, Ke
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 72 - 77
[4] Data-Based Optimal Synchronization Control for Discrete-Time Nonlinear Heterogeneous Multiagent Systems
Fu, Hao
Chen, Xin
Wang, Wei
Wu, Min
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) : 2477 - 2490
[5] Consensus of Discrete-Time Nonlinear Multiagent Systems Using Sliding Mode Control Based on Optimal Control
Yuan, Lin
Li, Jinna
IEEE ACCESS, 2022, 10 : 47275 - 47283
[6] Optimal Consensus Control Design for Multiagent Systems With Multiple Time Delay Using Adaptive Dynamic Programming
Zhang, Huaguang
Ren, He
Mu, Yunfei
Han, Ji
IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (12) : 12832 - 12842
[7] Data-based stable value iteration optimal control for unknown discrete-time systems with time delays
Ren, He
Zhang, Huaguang
Su, Hanguang
Mu, Yunfei
NEUROCOMPUTING, 2020, 382 : 96 - 105
[8] Input-Output Data-Based Output Antisynchronization Control of Multiagent Systems Using Reinforcement Learning Approach
Peng, Zhinan
Zhao, Yiyi
Hu, Jiangping
Luo, Rui
Ghosh, Bijoy Kumar
Nguang, Sing Kiong
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (11) : 7359 - 7367
[9] GrHDP Solution for Optimal Consensus Control of Multiagent Discrete-Time Systems
Zhong, Xiangnan
He, Haibo
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (07): : 2362 - 2374
[10] Sampled-Data Consensus for Multiagent Systems With Time Delays and Packet Losses
Xing, Mali
Deng, Feiqi
Hu, Zhipei
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2020, 50 (01): : 203 - 210

← 1 2 3 4 5 →