Distributed Consensus-Based Multi-Agent Off-Policy Temporal-Difference Learning

被引:5
作者
Stankovic, Milos S. [1 ,2 ,3 ]
Beko, Marko [3 ,4 ]
Stankovic, Srdjan S. [5 ]
机构
[1] Univ Singidunum, Belgrade, Serbia
[2] Vlatacom Inst, Belgrade, Serbia
[3] Univ Lusofona Humanidades & Tecnol, COPELABS, Lisbon, Portugal
[4] Univ Lisbon, Inst Telecomunicacoes, Inst Super Tecn, Lisbon, Portugal
[5] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
来源
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC) | 2021年
关键词
APPROXIMATION WEAK-CONVERGENCE; TIME; OPTIMIZATION; NETWORKS;
D O I
10.1109/CDC45484.2021.9683607
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose two novel distributed consensus-based temporal-difference algorithms for multi-agent off-policy learning of linear approximation of the value function in Markov decision processes. The algorithms are composed of: 1) local parameter updates based on single-agent off-policy algorithms TD(lambda) and ETD(lambda), and 2) a linear dynamic consensus scheme. The algorithms are completely decentralized, allowing: 1) efficient parallelization and 2) applications in which all the agents may have completely different behavior policies and different initial state distributions while evaluating a single target policy. Starting from the properties of the underlying Feller-Markov processes, we show that, under nonrestrictive assumptions, the algorithms weakly converge to a unique consensus point. A discussion is given on the asymptotic parameter values at consensus, including estimation bias and variance. The algorithms' properties are illustrated by characteristic simulation results.
引用
收藏
页码:5976 / 5981
页数:6
相关论文
共 34 条
[1]  
[Anonymous], 2015, C LEARNING THEORY, P1724
[2]  
[Anonymous], 2009, P 26 ANN INT C MACH
[3]  
[Anonymous], 2017, ARXIV171209652
[4]  
Bertsekas D. P., 1996, Neuro-dynamic Programming
[5]   Performance of a Distributed Stochastic Approximation Algorithm [J].
Bianchi, Pascal ;
Fort, Gersende ;
Hachem, Walid .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) :7405-7418
[6]  
Cassano L, 2019, 2019 18TH EUROPEAN CONTROL CONFERENCE (ECC), P505, DOI [10.23919/ECC.2019.8795670, 10.23919/ecc.2019.8795670]
[7]  
Dai B, 2018, PR MACH LEARN RES, V80
[8]  
Doan TT, 2019, PR MACH LEARN RES, V97
[9]  
Geist M, 2014, J MACH LEARN RES, V15, P289
[10]   Consensus based distributed change detection using Generalized Likelihood Ratio methodology [J].
Ilic, Nemanja ;
Stankovic, Srdjan S. ;
Stankovic, Milos S. ;
Johansson, Karl Henrik .
SIGNAL PROCESSING, 2012, 92 (07) :1715-1728