Distributed Reinforcement Learning via Gossip

被引：39

作者：

Mathkar, Adwaitvedant ^{[1
]}

Borkar, Vivek S. ^{[1
]}

机构：

[1] Indian Inst Technol, Dept Elect Engn, Bombay 400076, Maharashtra, India

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2017年 / 62卷 / 03期

关键词：

Distributed algorithm; gossip; reinforcement learning; stochastic approximation; TD(0);

D O I：

10.1109/TAC.2016.2585302

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.

引用

页码：1465 / 1470

页数：6

共 18 条

[1]

[Anonymous], 2000, P 17 INT C MACH LEAR

[2]

[Anonymous], 2009, Stochastic Approximation: A Dynamical Systems Viewpoint

[3]

Bertsekas D. P., 2012, DYNAMIC PROGRAMMING, VII

[4]

Borkar V. S., 1998, SIAM J CONTROL OPTIM, V38, P662

[5] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[6]

Chang Yu-han, 2003, Advances in Neural Information Processing Systems 17, NIPS'03, V16, P807

[7] QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus plus Innovations [J].

Kar, Soummya ;

Moura, Jose M. F. ;

Poor, H. Vincent .

IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (07) :1848-1862

[8]

LITTMAN M, 1993, NEURAL NETW INNS, P45

[9]

Macua S. V., 2012, P IEEE INT WORKSH CO, P1

[10]

Pendrith M. D., 2000, Proceedings of the Fourth International Conference on Autonomous Agents, P404, DOI 10.1145/336595.337554

← 1 2 →