Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents

被引：15

作者：

Zhang, Kaiqing ^{[1
,2
]}

Yang, Zhuoran ^{[3
]}

Liu, Han ^{[4
]}

Zhang, Tong ^{[5
]}

Basar, Tamer ^{[1
,2
]}

机构：

[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA

[2] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA

[3] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

[4] Northwestern Univ, Dept Elect Engn & Comp Sci & Stat, Evanston, IL 60208 USA

[5] Hong Kong Univ Sci & Technol, Dept Comp Sci & Engn, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATIC CONTROL | 2021年 / 66卷 / 12期

关键词：

Games; Markov processes; Approximation algorithms; Game theory; Heuristic algorithms; Function approximation; Reinforcement learning; Machine learning; multiagent systems; networked control systems; statistical learning; ZERO-SUM GAMES; POLICY ITERATION; GRAPHICAL GAMES; CONVERGENCE; CONSENSUS;

D O I：

10.1109/TAC.2021.3049345

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Despite the increasing interest in multiagent reinforcement learning (MARL) in multiple communities, understanding its theoretical foundation has long been recognized as a challenging problem. In this article, we address this problem by providing a finite-sample analysis for decentralized batch MARL. Specifically, we consider a type of mixed MARL setting with both cooperative and competitive agents, where two teams of agents compete in a zero-sum game setting, while the agents within each team collaborate by communicating over a time-varying network. This setting covers many conventional MARL settings in the literature. We then develop batch MARL algorithms that can be implemented in a decentralized fashion, and quantify the finite-sample errors of the estimated action-value functions. Our error analysis captures how the function class, the number of samples within each iteration, and the number of iterations determine the statistical accuracy of the proposed algorithms. Our results, compared to the finite-sample bounds for single-agent reinforcement learning, involve additional error terms caused by decentralized computation, which is inherent in our decentralized MARL setting. This article provides the first finite-sample analysis for batch MARL, a step toward rigorous theoretical understanding of general MARL algorithms in the finite-sample regime.

引用

页码：5925 / 5940

页数：16

共 61 条

[1] Multi-agent discrete-time graphical games and reinforcement learning solutions [J].

Abouheaf, Mohammed I. ;

Lewis, Frank L. ;

Vamvoudakis, Kyriakos G. ;

Haesaert, Sofie ;

Babuska, Robert .

AUTOMATICA, 2014, 50 (12) :3038-3053

[2] Model-free Q-learning designs for linear discrete-time zero-sum games with application to H-infinity control [J].

Al-Tamimi, Asma ;

Lewis, Frank L. ;

Abu-Khalaf, Murad .

AUTOMATICA, 2007, 43 (03) :473-481

[3]

[Anonymous], 2017, C WORKSH NEUR INF PR

[4]

[Anonymous], 2001, ICML 01, DOI DOI 10.5555/645530.655661

[5] Learning near-optimal policies with Bellman-residual minimization based fitted policy iteration and a single sample path [J].

Antos, Andras ;

Szepesvari, Csaba ;

Munos, Remi .

MACHINE LEARNING, 2008, 71 (01) :89-129

[6]

Bhandari J., 2018, PROC ANN C LEARNING, P1691

[7]

Boutilier C, 1996, THEORETICAL ASPECTS OF RATIONALITY AND KNOWLEDGE, P195

[8] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[9]

Cardenas A., 2009, WORKSH FUT DIR CYB P

[10]

Corke P, 2005, SPRINGER TRAC ADV RO, V15, P234

← 1 2 3 4 5 6 7 →