Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引:0
作者
Stankovic, Milos S. [1 ]
Stankovic, Srdjan S. [2 ,3 ]
机构
[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia
[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
[3] Vlatacom Inst, Belgrade, Serbia
来源
2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年
关键词
STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.
引用
收藏
页码:167 / 172
页数:6
相关论文
共 50 条
[41]   Consensus tracking control for time-varying delayed linear multi-agent systems under relative state saturation constraints [J].
Zanganeh, Javad ;
Hosseini Sani, Seyed Kamal ;
Pariz, Naser .
TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2023,
[42]   Time-varying group formation analysis and design for second-order multi-agent systems with directed topologies [J].
Dong, Xiwang ;
Li, Qingdong ;
Zhao, Qilun ;
Ren, Zhang .
NEUROCOMPUTING, 2016, 205 :367-374
[43]   Leader-following consensus criteria for multi-agent systems with time-varying delays and switching interconnection topologies [J].
Park, M. J. ;
Kwon, O. M. ;
Park, Ju H. ;
Lee, S. M. ;
Cha, E. J. .
CHINESE PHYSICS B, 2012, 21 (11)
[44]   Distributed Multi-Agent Gradient Based Q-Learning with Linear Function Approximation [J].
Stankovic, Milog S. ;
Beko, Marko ;
Stankovic, Srdjan S. .
2024 EUROPEAN CONTROL CONFERENCE, ECC 2024, 2024, :2500-2505
[45]   Time-varying output formation tracking of linear disturbed multi-agent systems by dynamic protocols [J].
Silva, Bruno Martins Calazans ;
Ishihara, Joao Yoshiyuki ;
Tognetti, Eduardo Stockler .
AUTOMATICA, 2025, 179
[46]   Adaptive Event-Triggering Consensus for Multi-Agent Systems with Linear Time-Varying Dynamics [J].
Zhang Wenbing ;
Abuzar, Hussein Mohammed Atitalla ;
Bao Jiatong ;
Liu Yurong .
JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2022, 35 (05) :1700-1718
[47]   Adaptive Event-Triggering Consensus for Multi-Agent Systems with Linear Time-Varying Dynamics [J].
Wenbing Zhang ;
Atitalla Abuzar Hussein Mohammed ;
Jiatong Bao ;
Yurong Liu .
Journal of Systems Science and Complexity, 2022, 35 :1700-1718
[48]   Distributed Fault Detection for Linear Time-Varying Multi-Agent Systems With Relative Output Information [J].
Zou, Peilu ;
Wang, Ping ;
Yu, Chengpu .
IEEE ACCESS, 2021, 9 (09) :42933-42946
[49]   Time-varying output formation tracking of heterogeneous linear multi-agent systems with dynamical controllers [J].
Liu, Congying ;
Wu, Xiaoqun ;
Wan, Xiaoxiao ;
Lu, Jinhu .
NEUROCOMPUTING, 2021, 441 :36-43
[50]   Finite-time consensus iterative learning control of discrete time-varying multi-agent systems [J].
Cao W. ;
Sun M. .
Kongzhi yu Juece/Control and Decision, 2019, 34 (04) :891-896