Sample and Communication-Efficient Decentralized Actor-Critic Algorithms with Finite-Time Analysis

被引：0

作者：

Chen, Ziyi ^{[1
]}

Zhou, Yi ^{[1
]}

Chen, Rong-Rong ^{[1
]}

Zou, Shaofeng ^{[2
]}

机构：

[1] Univ Utah, Dept Elect & Comp Engn, Salt Lake City, UT 84112 USA

[2] SUNY Buffalo, Dept Elect Engn, Buffalo, NY USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to learn the optimal joint control policy. However, existing decentralized AC algorithms either need to share agents' sensitive information or lack communication-efficiency. In this work, we develop decentralized AC and natural AC (NAC) algorithms that avoid sharing agents' local information and are sample and communicationefficient. In both algorithms, agents share only noisy rewards and use mini-batch local policy gradient updates to ensure high sample and communication efficiency. Particularly for decentralized NAC, we develop a decentralized Markovian SGD algorithm with an adaptive mini-batch size to efficiently compute the natural policy gradient. Under Markovian sampling and linear function approximation, we prove that the proposed decentralized AC and NAC algorithms achieve the state-of-the-art sample complexities O(epsilon (-2) ln epsilon(-1)) and O(epsilon (-3) ln epsilon(-1)), respectively, and achieve an improved communication complexity O(epsilon (-1) ln epsilon(-1)). Numerical experiments demonstrate that the proposed algorithms achieve lower sample and communication complexities than the existing decentralized AC algorithms.

引用

页数：41

共 74 条

[1] Agarwal A., 2019, ARXIV190800261
[2] Alfano C., 2021, ARXIV210911692
[3] [Anonymous], 2020, PR MACH LEARN RES
[4] Bai Q., 2021, ARXIV210514125
[5] Bhandari Jalaj, 2018, C LEARNING THEORY, P1691
[6] Bhatnagar S, 2007, Advances in neural information processing systems, V20, P105
[7] An actor-critic algorithm with function approximation for discounted cost constrained Markov decision processes
Bhatnagar, Shalabh
[J]. SYSTEMS & CONTROL LETTERS, 2010, 59 (12) : 760 - 766
[8] Natural actor-critic algorithms
Bhatnagar, Shalabh
Sutton, Richard S.
Ghavamzadeh, Mohammad
Lee, Mark
[J]. AUTOMATICA, 2009, 45 (11) : 2471 - 2482
[9] Bono G., 2018, JOINT EUROPEAN C MAC, P459
[10] Cassano L., 2020, IEEE T AUTOMATIC CON

← 1 2 3 4 5 6 7 8 →