Distributed Consensus-Based Multi-Agent Off-Policy Temporal-Difference Learning

被引：5

作者：

Stankovic, Milos S. ^{[1
,2
,3
]}

Beko, Marko ^{[3
,4
]}

Stankovic, Srdjan S. ^{[5
]}

机构：

[1] Univ Singidunum, Belgrade, Serbia

[2] Vlatacom Inst, Belgrade, Serbia

[3] Univ Lusofona Humanidades & Tecnol, COPELABS, Lisbon, Portugal

[4] Univ Lisbon, Inst Telecomunicacoes, Inst Super Tecn, Lisbon, Portugal

[5] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia

来源：

2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2021年

关键词：

APPROXIMATION WEAK-CONVERGENCE; TIME; OPTIMIZATION; NETWORKS;

D O I：

10.1109/CDC45484.2021.9683607

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose two novel distributed consensus-based temporal-difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy algorithms TD(lambda) and ETD(lambda), and 2) a linear dynamic consensus scheme. The algorithms are completely decentralized, allowing: 1) efficient parallelization and 2) applications in which all the agents may have completely different behavior policies and different initial state distributions while evaluating a single target policy. Starting from the properties of the underlying Feller-Markov processes, we show that, under nonrestrictive assumptions, the algorithms weakly converge to a unique consensus point. A discussion is given on the asymptotic parameter values at consensus, including estimation bias and variance. The algorithms' properties are illustrated by characteristic simulation results.

引用

页码：5976 / 5981

页数：6

共 34 条

[1]

[Anonymous], 2015, C LEARNING THEORY, P1724

[2]

[Anonymous], 2009, P 26 ANN INT C MACH

[3]

[Anonymous], 2017, ARXIV171209652

[4]

Bertsekas D. P., 1996, Neuro-dynamic Programming

[5] Performance of a Distributed Stochastic Approximation Algorithm [J].

Bianchi, Pascal ;

Fort, Gersende ;

Hachem, Walid .

IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) :7405-7418

[6]

Cassano L, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P505, DOI [10.23919/ECC.2019.8795670, 10.23919/ecc.2019.8795670]

[7]

Dai B, 2018, PR MACH LEARN RES, V80

[8]

Doan TT, 2019, PR MACH LEARN RES, V97

[9]

Geist M, 2014, J MACH LEARN RES, V15, P289

[10] Consensus based distributed change detection using Generalized Likelihood Ratio methodology [J].

Ilic, Nemanja ;

Stankovic, Srdjan S. ;

Stankovic, Milos S. ;

Johansson, Karl Henrik .

SIGNAL PROCESSING, 2012, 92 (07) :1715-1728

← 1 2 3 4 →