Distributed cooperative reinforcement learning for multi-agent system with collision avoidance

被引：4

作者：

Lan, Xuejing ^{[1
,2
]}

Yan, Jiapei ^{[1
,2
]}

He, Shude ^{[1
,2
,3
]}

Zhao, Zhijia ^{[1
,2
]}

Zou, Tao ^{[1
,2
]}

机构：

[1] Guangzhou Univ, Sch Mech & Elect Engn, Guangzhou 510006, Peoples R China

[2] Guangdong Hong Kong Macao Key Lab Multiscale Infor, Guangzhou, Peoples R China

[3] Anhui Prov Ctr Int Res Intelligent Control High en, Wuhu, Peoples R China

来源：

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL | 2024年 / 34卷 / 01期

基金：

中国国家自然科学基金; 中国博士后科学基金;

关键词：

dynamic obstacle; multi-agent system; neural network; optimal cooperative control; reinforcement learning; OPTIMAL CONSENSUS CONTROL;

D O I：

10.1002/rnc.6985

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we present an optimal cooperative control scheme for a multi-agent system in an unknown dynamic obstacle environment, based on an improved distributed cooperative reinforcement learning (RL) strategy with a three-layer collaborative mechanism. The three collaborative layers are collaborative perception layer, collaborative control layer, and collaborative evaluation layer. The incorporation of collaborative perception expands the perception range of a single agent, and improves the early warning ability of the agents for the obstacles. Neural networks (NNs) are employed to approximate the cost function and the optimal controller of each agent, where the NN weight matrices are collaboratively optimized to achieve global optimal performance. The distinction of the proposed control strategy is that cooperation of the agents is embodied not only in the input of NNs (in a collaborative perception layer) but also in their weight updating procedure (in the collaborative evaluation and collaborative control layers). Comparative simulations are carried out to demonstrate the effectiveness and performance of the proposed RL-based cooperative control scheme.

引用

页码：567 / 585

页数：19

共 48 条

[41] Distributed least squares solver for network linear equations [J].

Yang, Tao ;

George, Jemin ;

Qin, Jiahu ;

Yi, Xinlei ;

Wu, Junfeng .

AUTOMATICA, 2020, 113

[42] Data-Driven Optimal Consensus Control for Discrete-Time Multi-Agent Systems With Unknown Dynamics Using Reinforcement Learning Method [J].

Zhang, Huaguang ;

Jiang, He ;

Luo, Yanhong ;

Xiao, Geyang .

IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2017, 64 (05) :4091-4100

[43] Internal reinforcement adaptive dynamic programming for optimal containment control of unknown continuous-time multi-agent systems [J].

Zhang, Jiefu ;

Peng, Zhinan ;

Hu, Jiangping ;

Zhao, Yiyi ;

Luo, Rui ;

Ghosh, Bijoy Kumar .

NEUROCOMPUTING, 2020, 413 :85-95

[44] Leader-follower optimal coordination tracking control for multi-agent systems with unknown internal states [J].

Zhao, Wei ;

Li, Renfu ;

Zhang, Huaipin .

NEUROCOMPUTING, 2017, 249 :171-181

[45]

Zhao Z, 2022, IEEE Trans Autom Sci Eng, P1

[46] Distributed Mean-Field Density Estimation for Large-Scale Systems [J].

Zheng, Tongjia ;

Han, Qing ;

Lin, Hai .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2022, 67 (10) :5218-5229

[47] Relationship Between Persistent Excitation Levels and RBF Network Structures, With Application to Performance Analysis of Deterministic Learning [J].

Zheng, Tongjia ;

Wang, Cong .

IEEE TRANSACTIONS ON CYBERNETICS, 2017, 47 (10) :3380-3392

[48] Discrete-time dynamic average consensus [J].

Zhu, Minghui ;

Martinez, Sonia .

AUTOMATICA, 2010, 46 (02) :322-329

← 1 2 3 4 5 →