Distributed Reinforcement Learning via Gossip

被引:39
作者
Mathkar, Adwaitvedant [1 ]
Borkar, Vivek S. [1 ]
机构
[1] Indian Inst Technol, Dept Elect Engn, Bombay 400076, Maharashtra, India
关键词
Distributed algorithm; gossip; reinforcement learning; stochastic approximation; TD(0);
D O I
10.1109/TAC.2016.2585302
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider the classical TD(0) algorithm implemented on a network of agents wherein the agents also incorporate updates received from neighboring agents using a gossip-like mechanism. The combined scheme is shown to converge for both discounted and average cost problems.
引用
收藏
页码:1465 / 1470
页数:6
相关论文
共 18 条
[1]  
[Anonymous], 2000, P 17 INT C MACH LEAR
[2]  
[Anonymous], 2009, Stochastic Approximation: A Dynamical Systems Viewpoint
[3]  
Bertsekas D. P., 2012, DYNAMIC PROGRAMMING, VII
[4]  
Borkar V. S., 1998, SIAM J CONTROL OPTIM, V38, P662
[5]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[6]  
Chang Yu-han, 2003, Advances in Neural Information Processing Systems 17, NIPS'03, V16, P807
[7]   QD-Learning: A Collaborative Distributed Strategy for Multi-Agent Reinforcement Learning Through Consensus plus Innovations [J].
Kar, Soummya ;
Moura, Jose M. F. ;
Poor, H. Vincent .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2013, 61 (07) :1848-1862
[8]  
LITTMAN M, 1993, NEURAL NETW INNS, P45
[9]  
Macua S. V., 2012, P IEEE INT WORKSH CO, P1
[10]  
Pendrith M. D., 2000, Proceedings of the Fourth International Conference on Autonomous Agents, P404, DOI 10.1145/336595.337554