Distributed Multiagent Reinforcement Learning Based on Graph-Induced Local Value Functions

被引：2

作者：

Jing, Gangshan ^{[1
]}

Bai, He ^{[2
]}

George, Jemin ^{[3
]}

Chakrabortty, Aranya ^{[4
]}

Sharma, Piyush K. ^{[3
]}

机构：

[1] Chongqing Univ, Sch automat, Chongqing 400044, Peoples R China

[2] Oklahoma State Univ, Stillwater, OK 74078 USA

[3] US Army, Res Lab, Adelphi, MD 20783 USA

[4] North Carolina State Univ, Raleigh, NC 27695 USA

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2024年 / 69卷 / 10期

基金：

美国国家科学基金会; 中国国家自然科学基金;

关键词：

Couplings; Heuristic algorithms; Convergence; Approximation algorithms; Scalability; Reinforcement learning; Indexes; Distributed learning; Markov decision process; multiagent systems; optimal control; reinforcement learning;

D O I：

10.1109/TAC.2024.3375248

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Achieving distributed reinforcement learning (RL) for large-scale cooperative multiagent systems (MASs) is challenging because: 1) each agent has access to only limited information and 2) issues on scalability and sample efficiency emerge due to the curse of dimensionality. In this article, we propose a general distributed framework for sample efficient cooperative multiagent reinforcement learning (MARL) by utilizing the structures of graphs involved in this problem. We introduce three coupling graphs describing three types of interagent couplings in MARL, namely, the state graph, observation graph, and reward graph. By further considering a communication graph, we propose two distributed RL approaches based on local value functions derived from the coupling graphs. The first approach is able to reduce sample complexity significantly under specific conditions on the aforementioned four graphs. The second approach provides an approximate solution and can be efficient even for problems with dense coupling graphs. Here there is a tradeoff between minimizing the approximation error and reducing the computational complexity. Simulations show that our RL algorithms have a significantly improved scalability to large-scale MASs compared with centralized and consensus-based distributed RL algorithms.

引用

页码：6636 / 6651

页数：16

共 45 条

[1]

Agarwal Alekh, 2021, JOURNAL OF MACHINE LEARNING RESEARCH, V22

[2]

Akhavan A, 2021, ADV NEUR IN, V34

[3]

[Anonymous], 2002, INT C MACH LEARN ICM

[4] Global Optimality Guarantees for Policy Gradient Methods [J].

Bhandari, Jalaj ;

Russo, Daniel .

OPERATIONS RESEARCH, 2024, 72 (05) :1906-1927

[5] Communication-Efficient Policy G ad en Methods for Distributed Reinforcement Learning [J].

Chen, Tianyi ;

Zhang, Kaiqing ;

Giannakis, Georgios B. ;

Basar, Tamer .

IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2022, 9 (02) :917-929

[6]

Fazel M, 2018, PR MACH LEARN RES, V80

[7] On the Exponential Number of Connected Components for the Feasible Set of Optimal Decentralized Control Problems [J].

Feng, Han ;

Lavaei, Javad .

2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, :1430-1437

[8] Distributed Adaptive Linear Quadratic Control using Distributed Reinforcement Learning [J].

Goerges, Daniel .

IFAC PAPERSONLINE, 2019, 52 (11) :218-223

[9]

Gratton Cristiano, 2020, arXiv

[10] Multi-agent deep reinforcement learning: a survey [J].

Gronauer, Sven ;

Diepold, Klaus .

ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) :895-943

← 1 2 3 4 5 →