Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引：0

作者：

Stankovic, Milos S. ^{[1
]}

Stankovic, Srdjan S. ^{[2
,3
]}

机构：

[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia

[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia

[3] Vlatacom Inst, Belgrade, Serbia

来源：

2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年

关键词：

STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.

引用

页码：167 / 172

页数：6

共 50 条

[1] On Uniform Consensus of Linear Multi-Agent Systems with Time-Varying Graph Topologies
Cai Ning
Liu Minghua
Wei Xiaojuan
Ma Haiying
2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 6896 - 6899
[2] Distributed multi-agent temporal-difference learning with full neighbor information
Peng, Zhinan
Hu, Jiangping
Luo, Rui
Ghosh, Bijoy K.
CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 379 - 389
[3] Distributed consensus-based multi-agent temporal-difference learning
Stankovic, Milos S.
Beko, Marko
Stankovic, Srdjan S.
AUTOMATICA, 2023, 151
[4] Multi-agent consensus with time-varying delays and switching topologies
Wei, Jia
Fang, Huajing
JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2014, 25 (03) : 489 - 495
[5] Multi-agent consensus with time-varying delays and switching topologies
Jia Wei
Huajing Fang
Journal of Systems Engineering and Electronics, 2014, 25 (03) : 489 - 495
[6] Time-varying group formation control for general linear multi-agent systems with directed topologies
Dong, Xiwang
Li, Qingdong
Zhao, Qilun
Ren, Zhang
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7733 - 7738
[7] Containment Control of Continuous-time Multi-agent Systems with General Linear Dynamics under Time-varying Communication Topologies
Yang, Zhe
Mu, Xiao-wu
Liu, Kai
INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (01) : 442 - 449
[8] Consensus of piecewise time-varying multi-agent systems with switching topologies
Sun, Jian
Guo, Chen
Liu, Lei
TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2022, 44 (13) : 2522 - 2531
[9] Time-varying group formation-tracking control for general linear multi-agent systems with switching topologies and time-varying delays
Zhou, Shiyu
Dong, Xiwang
Tan, Qingke
Wang, Qing
Ren, Zhang
2021 22ND IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2021, : 105 - 110
[10] Distributed constrained optimization for multi-agent networks with communication delays under time-varying topologies
An, Yuanyuan
Wang, Aiping
Zhang, Xufeng
Xiao, Feng
SYSTEMS & CONTROL LETTERS, 2024, 185

← 1 2 3 4 5 →