Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

被引：0

作者：

Xiaoxiao Zhao

Peng Yi

Li Li

机构：

[1] Tongji University,College of Electronic and Information Engineering

[2] Tongji University,Institute of Intelligent Science and Technology

来源：

Control Theory and Technology | 2020年 / 18卷

关键词：

Multi-agent system; Reinforcement learning; Distributed optimization; Policy evaluation;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.

引用

页码：362 / 378

页数：16

共 41 条

[1]

Busoniu L(2008)A comprehensive survey of multiagent reinforcement learning IEEE Transactions on Systems, Man, and Cybernetics - Part C: Applications and Reviews 38 156-172

[2]

BabuÅka R(2019)Multi-agent reinforcement learning-based resource allocation for UAV networks IEEE Transactions on Wireless Communications 19 729-743

[3]

Schutter B(2019)A reinforcement learning-based multi-agent framework applied for solving routing and scheduling problems Expert Systems with Applications 131 148-171

[4]

Cui J(2019)A survey and critique of multiagent deep reinforcement learning Autonomous Agents and Multi-Agent Systems 33 750-797

[5]

Liu Y(2019)ADMM based privacy-preserving decentralized optimization IEEE Transactions on Information Forensics and Security 14 565-580

[6]

Nallanathan A(2017)Distributed reinforcement learning via gossip IEEE Transactions on Automatic Control 62 1465-1470

[7]

Silva M(2013)QD-learning: A Collaborative distributed strategy for multi-agent reinforcement learning through consensus plus innovations IEEE Transactions on Signal Processing 61 1848-1862

[8]

Souza D(2019)A survey of distributed optimization Annual Reviews in Control 47 278-305

[9]

Souza M(2011)Distributed optimization and statistical learning via the alternating direction method of multipliers Foundations and Trends in Machine learning 3 1-122

[10]

Hernandez LP(2014)On the linear convergence of the ADMM in decentralized consensus optimization IEEE Transactions on Signal Processing 62 1750-1761

← 1 2 3 4 5 →