Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning

被引：16

作者：

Zhao, Xiaoxiao ^{[1
]}

Yi, Peng ^{[1
,2
]}

Li, Li ^{[1
,2
]}

机构：

[1] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China

[2] Tongji Univ, Inst Intelligent Sci & Technol, Shanghai 201203, Peoples R China

来源：

CONTROL THEORY AND TECHNOLOGY | 2020年 / 18卷 / 04期

关键词：

Multi-agent system; Reinforcement learning; Distributed optimization; Policy evaluation;

D O I：

10.1007/s11768-020-00007-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies a distributed policy evaluation in multi-agent reinforcement learning. Under cooperative settings, each agent only obtains a local reward, while all agents share a common environmental state. To optimize the global return as the sum of local return, the agents exchange information with their neighbors through a communication network. The mean squared projected Bellman error minimization problem is reformulated as a constrained convex optimization problem with a consensus constraint; then, a distributed alternating directions method of multipliers (ADMM) algorithm is proposed to solve it. Furthermore, an inexact step for ADMM is used to achieve efficient computation at each iteration. The convergence of the proposed algorithm is established.

引用

页码：362 / 378

页数：17

共 27 条

[1]

[Anonymous], 2009, P 26 ANN INT C MACH

[2] Distributed optimization and statistical learning via the alternating direction method of multipliers [J].

Boyd S. ;

Parikh N. ;

Chu E. ;

Peleato B. ;

Eckstein J. .

Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122

[3] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[4] Asynchronous Distributed ADMM for Large-Scale Optimization-Part I: Algorithm and Convergence Analysis [J].

Chang, Tsung-Hui ;

Hong, Mingyi ;

Liao, Wei-Cheng ;

Wang, Xiangfeng .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (12) :3118-3130

[5] Asynchronous Distributed ADMM for Large-Scale Optimization-Part II: Linear Convergence Analysis and Numerical Performance [J].

Chang, Tsung-Hui ;

Liao, Wei-Cheng ;

Hong, Mingyi ;

Wang, Xiangfeng .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2016, 64 (12) :3131-3144

[6] Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks [J].

Cui, Jingjing ;

Liu, Yuanwei ;

Nallanathan, Arumugam .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (02) :729-743

[7]

Dann C, 2014, J MACH LEARN RES, V15, P809

[8]

Foerster JN, 2016, ADV NEUR IN, V29

[9]

Foerster JN, 2017, PR MACH LEARN RES, V70

[10]

Garnett R., 2018, ADV NEUR IN, V31, P9672

← 1 2 3 →