Distributed reinforcement learning in multi-agent networks

被引:0
作者
Kar, Soummya [1 ]
Moura, Jose M. F. [1 ]
Poor, H. Vincent [2 ]
机构
[1] Carnegie Mellon Univ, Dept ECE, Pittsburgh, PA 15213 USA
[2] Princeton Univ, Dept EE, Princeton, NJ 08544 USA
来源
2013 IEEE 5TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2013) | 2013年
基金
美国国家科学基金会;
关键词
Multi-agent stochastic control; distributed Q-learning; reinforcement learning; collaborative network processing; consensus plus innovations; distributed stochastic approximation;
D O I
暂无
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Distributed reinforcement learning algorithms for collaborative multi-agent Markov decision processes (MDPs) are presented and analyzed. The networked setup consists of a collection of agents (learners) which respond differently (depending on their instantaneous one-stage random costs) to a global controlled state and the control actions of a remote controller. With the objective of jointly learning the optimal stationary control policy (in the absence of global state transition and local agent cost statistics) that minimizes network-averaged infinite horizon discounted cost, the paper presents distributed variants of Q-learning of the consensus + innovations type in which each agent sequentially refines its learning parameters by locally processing its instantaneous payoff data and the information received from neighboring agents. Under broad conditions on the multi-agent decision model and mean connectivity of the inter-agent communication network, the proposed distributed algorithms are shown to achieve optimal learning asymptotically, i. e., almost surely (a. s.) each network agent is shown to learn the value function and the optimal stationary control policy of the collaborative MDP asymptotically. Further, convergence rate estimates for the proposed class of distributed learning algorithms are obtained.
引用
收藏
页码:296 / +
页数:2
相关论文
共 50 条
  • [21] Multi-Agent Deep Reinforcement Learning for Coordinated Multipoint in Mobile Networks
    Schneider, Stefan
    Karl, Holger
    Khalili, Ramin
    Hecker, Artur
    IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2024, 21 (01): : 908 - 924
  • [22] Deep Reinforcement Learning for Multi-Agent Power Control in Heterogeneous Networks
    Zhang, Lin
    Liang, Ying-Chang
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (04) : 2551 - 2564
  • [23] Multi-Agent Reinforcement Learning-Based Distributed Dynamic Spectrum Access
    Albinsaid, Hasan
    Singh, Keshav
    Biswas, Sudip
    Li, Chih-Peng
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 1174 - 1185
  • [24] Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
    Xiaoxiao Zhao
    Peng Yi
    Li Li
    Control Theory and Technology, 2020, 18 : 362 - 378
  • [25] A Distributed Multi-Agent Dynamic Area Coverage Algorithm Based on Reinforcement Learning
    Xiao, Jian
    Wang, Gang
    Zhang, Ying
    Cheng, Lei
    IEEE ACCESS, 2020, 8 : 33511 - 33521
  • [26] Cooperative Reinforcement Learning Algorithm to Distributed Power System Based on Multi-Agent
    Gao, La-mei
    Zeng, Jun
    Wu, Jie
    Li, Min
    2009 3RD INTERNATIONAL CONFERENCE ON POWER ELECTRONICS SYSTEMS AND APPLICATIONS: ELECTRIC VEHICLE AND GREEN ENERGY, 2009, : 53 - 53
  • [27] Urban Traffic Control Using Distributed Multi-agent Deep Reinforcement Learning
    Kitagawa, Shunya
    Moustafa, Ahmed
    Ito, Takayuki
    PRICAI 2019: TRENDS IN ARTIFICIAL INTELLIGENCE, PT III, 2019, 11672 : 337 - 349
  • [28] Distributed policy evaluation via inexact ADMM in multi-agent reinforcement learning
    Zhao, Xiaoxiao
    Yi, Peng
    Li, Li
    CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 362 - 378
  • [29] Generalized learning automata for multi-agent reinforcement learning
    De Hauwere, Yann-Michael
    Vrancx, Peter
    Nowe, Ann
    AI COMMUNICATIONS, 2010, 23 (04) : 311 - 324
  • [30] Multi-agent reinforcement learning: weighting and partitioning
    Sun, R
    Peterson, T
    NEURAL NETWORKS, 1999, 12 (4-5) : 727 - 753