Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

被引:0
作者
Chen, Ziyi [1 ]
Zhou, Yi [1 ]
Chen, Rong-Rong [1 ]
Zou, Shaofeng [2 ]
机构
[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA
[2] SUNY Buffalo, Dept Elect Engn, Buffalo, NY USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents' sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents' local information and are sample and communicationefficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities O(epsilon (-2) ln epsilon(-1)) and O(epsilon (-3) ln epsilon(-1)), respectively, and achieve an improved communication complexity O(epsilon (-1) ln epsilon(-1)). Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.
引用
收藏
页数:41
相关论文
共 74 条
  • [1] Agarwal A., 2019, ARXIV190800261
  • [2] Alfano C., 2021, ARXIV210911692
  • [3] [Anonymous], 2020, PR MACH LEARN RES
  • [4] Bai Q., 2021, ARXIV210514125
  • [5] Bhandari Jalaj, 2018, C LEARNING THEORY, P1691
  • [6] Bhatnagar S, 2007, Advances in neural information processing systems, V20, P105
  • [8] Natural actor-critic algorithms
    Bhatnagar, Shalabh
    Sutton, Richard S.
    Ghavamzadeh, Mohammad
    Lee, Mark
    [J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482
  • [9] Bono G., 2018, JOINT EUROPEAN C MAC, P459
  • [10] Cassano L., 2020, IEEE T AUTOMATIC CON