Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引:0
作者
Stankovic, Milos S. [1 ]
Stankovic, Srdjan S. [2 ,3 ]
机构
[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia
[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
[3] Vlatacom Inst, Belgrade, Serbia
来源
2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年
关键词
STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.
引用
收藏
页码:167 / 172
页数:6
相关论文
共 50 条
[21]   Consensus Analysis for Linear Multi-agent Systems with Time-varying Delays [J].
Zhang, Fen ;
Li, Zhi .
PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, :8281-8286
[22]   Average consensus in multi-agent systems with uncertain topologies and multiple time-varying delays [J].
Shang, Yilun .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2014, 459 :411-429
[23]   Cooperative output regulation of heterogeneous linear multi-agent systems with edge-event triggered adaptive control under time-varying topologies [J].
Zhang, Juan ;
Zhang, Huaguang ;
Lu, Yanzheng ;
Sun, Shaoxin .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (19) :15573-15584
[24]   Distributed resource allocation via multi-agent systems under time-varying networks [J].
Lu, Kaihong ;
Xu, Hang ;
Zheng, Yuanshi .
AUTOMATICA, 2022, 136
[25]   Convergence Rate Analysis for Discrete-Time Multi-Agent Systems with Time-Varying Delays [J].
Chen Yao ;
Ho, Daniel W. C. ;
Lu Jinhu ;
Lin Zongli .
PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, :4578-4583
[26]   Formation Control for Nonlinear Multi-agent Systems with Diverse Time-Varying Delays and Uncertain Topologies [J].
Luo, Hefu ;
Peng, Shiguo .
2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, :1730-1736
[27]   Adaptive synchronization of linear multi-agent systems with time-varying multiple delays [J].
Petrillo, Alberto ;
Salvi, Alessandro ;
Santini, Stefania ;
Valente, Antonio Saverio .
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2017, 354 (18) :8586-8605
[28]   Consensus of linear time-varying multi-agent systems with a variable number of nodes [J].
Ji, Xiaolei ;
Hao, Fei .
JOURNAL OF THE FRANKLIN INSTITUTE, 2025, 362 (09)
[29]   Average dwell-time conditions for consensus of discrete-time linear multi-agent systems with switching topologies and time-varying delays [J].
Ge, Yan-Rong ;
Chen, Yang-Zhou ;
Zhang, Ya-Xiao .
Zidonghua Xuebao/Acta Automatica Sinica, 2014, 40 (11) :2609-2617
[30]   Synchronization of Multi-Agent Systems under Time-Varying Network via Time-Delay Approach to Averaging [J].
Caiazzo, Bianca ;
Fridman, Emilia ;
Petrillo, Alberto ;
Santini, Stefania .
IFAC PAPERSONLINE, 2022, 55 (36) :133-138