Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning

被引：29

作者：

Wang, Xin ^{[1
]}

Zhao, Chen ^{[1
]}

Huang, Tingwen ^{[2
]}

Chakrabarti, Prasun ^{[3
]}

Kurths, Juergen ^{[4
,5
]}

机构：

[1] Southwest Univ, Coll Elect & Informat Engn, Chongqing Key Lab Nonlinear Circuits & Intelligent, Chongqing 400075, Peoples R China

[2] T&M Univ Qatar, Doha 23874, Qatar

[3] ITM SLS Baroda Univ, Vadodara 391510, Gujarat, India

[4] Potsdam Inst Climate Impact Res, D-14473 Potsdam, Germany

[5] Humboldt Univ, Inst Phys, D-12489 Berlin, Germany

来源：

IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS | 2023年 / 9卷

基金：

中国国家自然科学基金;

关键词：

Multi-agent systems; Reinforcement learning; Artificial neural networks; Neural networks; Actuators; Information processing; Behavioral sciences; cooperative learning; multi-agent systems; reinforcement learning; GAMES;

D O I：

10.1109/TSIPN.2023.3239654

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In many specific scenarios, accurateand practical cooperative learning is a commonly encountered challenge in multi-agent systems. Thus, the current investigation focuses on cooperative learning algorithms for multi-agent systems and underpins an alternate data-based neural network reinforcement learning framework. To achieve the data-based learning optimization, the proposed cooperative learning framework, which comprises two layers, introduces a virtual learning objective. The followers learn the behaviors of the virtual objects in the first layer based on the adaptive neural networks (NNs). Specifically, the actor and critic NNs are applied to acquire cooperative behaviors and assess this layer's long-term utility function. Then another layer realizes the tracking performance between the virtual objects and the leader by introducing the local data-based performance index. Then, we formulate a resulting deterministic optimization problem and resolve it effectively with the policy iteration algorithm. This intuitive cooperative learning algorithm also preserves good robustness properties and eliminates the dependence on the prior knowledge of the multi-agent system model in the solution process. Finally, a multi-robot formation system demonstrates this promising development's practical appeal and highly effective outcome.

引用

页码：13 / 23

页数：11

共 39 条

[1] NN Reinforcement Learning Adaptive Control for a Class of Nonstrict-Feedback Discrete-Time Systems [J].

Bai, Weiwei ;

Li, Tieshan ;

Tong, Shaocheng .

IEEE TRANSACTIONS ON CYBERNETICS, 2020, 50 (11) :4573-4584

[2] Event-Triggered Multigradient Recursive Reinforcement Learning Tracking Control for Multiagent Systems [J].

Bai, Weiwei ;

Li, Tieshan ;

Long, Yue ;

Chen, C. L. Philip .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (01) :366-379

[3] Combining Model-Based Q-Learning With Structural Knowledge Transfer for Robot Skill Learning [J].

Deng, Zhen ;

Guan, Haojun ;

Huang, Rui ;

Liang, Hongzhuo ;

Zhang, Liwei ;

Zhang, Jianwei .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2019, 11 (01) :26-35

[4] A Multiagent Approach to the Dynamic Enactment of Semantic Transportation Services [J].

Fernandez, Alberto ;

Ossowski, Sascha .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2011, 12 (02) :333-342

[5] IBLF-Based Adaptive Neural Control of State-Constrained Uncertain Stochastic Nonlinear Systems [J].

Gao, Tingting ;

Li, Tieshan ;

Liu, Yan-Jun ;

Tong, Shaocheng .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (12) :7345-7356

[6] Adaptive Neural Control Using Tangent Time-Varying BLFs for a Class of Uncertain Stochastic Nonlinear Systems With Full State Constraints [J].

Gao, Tingting ;

Liu, Yan-Jun ;

Li, Dapeng ;

Tong, Shaocheng ;

Li, Tieshan .

IEEE TRANSACTIONS ON CYBERNETICS, 2021, 51 (04) :1943-1953

[7] Data Injection Attacks in Randomized Gossiping [J].

Gentz, Reinhard ;

Wu, Sissi Xiaoxiao ;

Wai, Hoi-To ;

Scaglione, Anna ;

Leshem, Amir .

IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2016, 2 (04) :523-538

[8] Data Falsification Attacks on Consensus-Based Detection Systems [J].

Kailkhura, Bhavya ;

Brahma, Swastik ;

Varshney, Pramod K. .

IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2017, 3 (01) :145-158

[9] Robust Finite-Time Consensus Tracking Algorithm for Multirobot Systems [J].

Khoo, Suiyang ;

Xie, Lihua ;

Man, Zhihong .

IEEE-ASME TRANSACTIONS ON MECHATRONICS, 2009, 14 (02) :219-228

[10] Understanding Decisions in Collective Risk Social Dilemma Games Using Reinforcement Learning [J].

Kumar, Medha ;

Dutt, Varun .

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2020, 12 (04) :824-840

← 1 2 3 4 →