Multi-agent off-policy actor-critic algorithm for distributed multi-task reinforcement learning

被引:3
作者
Stankovic, Milos S. [1 ,2 ]
Beko, Marko [3 ,4 ]
Ilic, Nemanja [2 ,5 ]
Stankovic, Srdjan S. [6 ]
机构
[1] Singidunum Univ, Belgrade 11000, Serbia
[2] Vlatacom Inst, Belgrade 11000, Serbia
[3] Univ Lisbon, Inst Telecomunicacoes, Inst Super Tecn, P-1049001 Lisbon, Portugal
[4] Univ Lusofona, COPELABS, P-1700097 Lisbon, Portugal
[5] Coll Appl Tech Sci, Krusevac 37000, Serbia
[6] Univ Belgrade, Sch Elect Engn, Belgrade 11000, Serbia
关键词
Reinforcement learning; Distributed consensus; Multi-agent systems; Actor-Critic learning; Convergence analysis; Policy gradient; Multi-task learning; Off-policy learning; Weak convergence; Collaborative networks;
D O I
10.1016/j.ejcon.2023.100853
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper a new distributed multi-agent Actor-Critic algorithm for reinforcement learning is proposed for solving multi-agent multi-task optimization problems. The Critic algorithm is in the form of a Distributed Emphatic Temporal Difference DETD( lambda) algorithm, while the Actor algorithm is proposed as a complementary consensus based policy gradient algorithm, derived from a global objective function having the role of a scalarizing function in multi-objective optimization. It is demonstrated that the FellerMarkov properties hold for the newly derived Actor algorithm. A proof of the weak convergence of the algorithm to the limit set of an attached ODE is derived under mild conditions, using a specific decomposition between the Critic and the Actor algorithms and additional two-time-scale stochastic approximation arguments. An experimental verification of the algorithm properties is given, showing that the algorithm can represent an efficient tool for practice. (c) 2023 European Control Association. Published by Elsevier Ltd. All rights reserved.
引用
收藏
页数:9
相关论文
共 42 条
[1]  
Bertsekas D.P., 1996, NEURO DYNAMIC PROGRA
[2]  
Bertsekas DimitriP., 2017, DYNAMIC PROGRAMMING, V1
[3]   Natural actor-critic algorithms [J].
Bhatnagar, Shalabh ;
Sutton, Richard S. ;
Ghavamzadeh, Mohammad ;
Lee, Mark .
AUTOMATICA, 2009, 45 (11) :2471-2482
[4]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[5]  
Degris Thomas., 2012, P 29 INT C MACHINE L, P179
[6]   Multiple-gradient descent algorithm (MGDA) for multiobjective optimization [J].
Desideri, Jean-Antoine .
COMPTES RENDUS MATHEMATIQUE, 2012, 350 (5-6) :313-318
[7]  
Doan TT, 2019, PR MACH LEARN RES, V97
[8]  
Ehrgott M., 2005, Multicriteria optimization
[9]  
Geist M, 2014, J MACH LEARN RES, V15, P289
[10]  
Gupta Jayesh K., 2017, Autonomous Agents and Multiagent Systems, AAMAS 2017: Workshops, Best Papers. Revised Selected Papers: LNAI 10642, P66, DOI 10.1007/978-3-319-71682-4_5