Distributed cooperative reinforcement learning for multi-agent system with collision avoidance

被引：3

作者：

Lan, Xuejing ^{[1
,2
]}

Yan, Jiapei ^{[1
,2
]}

He, Shude ^{[1
,2
,3
]}

Zhao, Zhijia ^{[1
,2
]}

Zou, Tao ^{[1
,2
]}

机构：

[1] Guangzhou Univ, Sch Mech & Elect Engn, Guangzhou 510006, Peoples R China

[2] Guangdong Hong Kong Macao Key Lab Multiscale Infor, Guangzhou, Peoples R China

[3] Anhui Prov Ctr Int Res Intelligent Control High en, Wuhu, Peoples R China

来源：

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL | 2024年 / 34卷 / 01期

基金：

中国博士后科学基金; 中国国家自然科学基金;

关键词：

dynamic obstacle; multi-agent system; neural network; optimal cooperative control; reinforcement learning; OPTIMAL CONSENSUS CONTROL;

D O I：

10.1002/rnc.6985

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we present an optimal cooperative control scheme for a multi-agent system in an unknown dynamic obstacle environment, based on an improved distributed cooperative reinforcement learning (RL) strategy with a three-layer collaborative mechanism. The three collaborative layers are collaborative perception layer, collaborative control layer, and collaborative evaluation layer. The incorporation of collaborative perception expands the perception range of a single agent, and improves the early warning ability of the agents for the obstacles. Neural networks (NNs) are employed to approximate the cost function and the optimal controller of each agent, where the NN weight matrices are collaboratively optimized to achieve global optimal performance. The distinction of the proposed control strategy is that cooperation of the agents is embodied not only in the input of NNs (in a collaborative perception layer) but also in their weight updating procedure (in the collaborative evaluation and collaborative control layers). Comparative simulations are carried out to demonstrate the effectiveness and performance of the proposed RL-based cooperative control scheme.

引用

页码：567 / 585

页数：19

共 48 条

[1] Fuzzy adaptive dynamic programming-based optimal leader-following consensus for heterogeneous nonlinear multi-agent systems [J].

Cai, Yuliang ;

Zhang, Huaguang ;

Zhang, Kun ;

Liu, Chong .

NEURAL COMPUTING & APPLICATIONS, 2020, 32 (13) :8763-8781

[2] Adaptive optimal formation control for unmanned surface vehicles with guaranteed performance using actor-critic learning architecture [J].

Chen, Lin ;

Dong, Chao ;

He, Shude ;

Dai, Shi-Lu .

INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2023, 33 (08) :4504-4522

[3] A Fuzzy Curiosity-Driven Mechanism for Multi-Agent Reinforcement Learning [J].

Chen, Wenbai ;

Shi, Haobin ;

Li, Jingchen ;

Hwang, Kao-Shing .

INTERNATIONAL JOURNAL OF FUZZY SYSTEMS, 2021, 23 (05) :1222-1233

[4] Event-triggered Hoo consensus for uncertain nonlinear systems using integral sliding mode based adaptive dynamic programming [J].

Chen, Zitao ;

Chen, Kairui ;

Chen, Si-Zhe ;

Zhang, Yun .

NEURAL NETWORKS, 2022, 156 (258-270) :258-270

[5] Reinforcement learning-based asymptotic cooperative tracking of a class multi-agent dynamic systems using neural networks [J].

Cui, Lili ;

Wang, Xiaowei ;

Zhang, Yong .

NEUROCOMPUTING, 2016, 171 :220-229

[6] An Approximate Dynamic Programming Approach to Multiagent Persistent Monitoring in Stochastic Environments With Temporal Logic Constraints [J].

Deng, Kun ;

Chen, Yushan ;

Belta, Calin .

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (09) :4549-4563

[7] Distributed 3-D Time-Varying Formation Control of Underactuated AUVs With Communication Delays Based on Data-Driven State Predictor [J].

Du, Jialu ;

Li, Jian ;

Lewis, Frank L. .

IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2023, 19 (05) :6963-6971

[8] Data-Based Optimal Synchronization Control for Discrete-Time Nonlinear Heterogeneous Multiagent Systems [J].

Fu, Hao ;

Chen, Xin ;

Wang, Wei ;

Wu, Min .

IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (04) :2477-2490

[9] Multi-agent deep reinforcement learning: a survey [J].

Gronauer, Sven ;

Diepold, Klaus .

ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) :895-943

[10] Learning-based collision-free coordination for a team of uncertain quadrotor UAVs [J].

Guo, Yaohua ;

Chen, Gang ;

Zhao, Tao .

AEROSPACE SCIENCE AND TECHNOLOGY, 2021, 119

← 1 2 3 4 5 →