Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引:0
作者
Stankovic, Milos S. [1 ]
Stankovic, Srdjan S. [2 ,3 ]
机构
[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia
[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
[3] Vlatacom Inst, Belgrade, Serbia
来源
2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年
关键词
STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.
引用
收藏
页码:167 / 172
页数:6
相关论文
共 50 条
  • [31] Time-varying formation control for double-integrator multi-agent systems with jointly connected topologies
    Dong, Xiwang
    Han, Liang
    Li, Qingdong
    Ren, Zhang
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2016, 47 (16) : 3829 - 3838
  • [32] Consensus Analysis in Multi-agent Systems with Non-uniform Time-varying Delays and Uncertain Topologies
    Subbarao, Kamesh
    Bhusal, Rajnish
    [J]. IFAC PAPERSONLINE, 2022, 55 (36): : 139 - 144
  • [33] Containment analysis and design for general linear multi-agent systems with time-varying delays
    Dong, Xiwang
    Han, Liang
    Li, Qingdong
    Chen, Jian
    Ren, Zhang
    [J]. NEUROCOMPUTING, 2016, 173 : 2062 - 2068
  • [34] Time-varying formation control for linear multi-agent systems with distributed adaptive protocols
    Wang, Rui
    Dong, Xiwang
    Li, Qingdong
    Ren, Zhang
    [J]. PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 1332 - 1337
  • [35] Distributed adaptive control for time-varying formation of general linear multi-agent systems
    Wang, Rui
    Dong, Xiwang
    Li, Qingdong
    Ren, Zhang
    [J]. INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2017, 48 (16) : 3491 - 3503
  • [36] Time-varying formation feasibility analysis for linear multi-agent systems with time delays and switching graphs
    Dong, Xiwang
    Hua, Yongzhao
    Hu, Guoqiang
    Ren, Zhang
    [J]. PROCEEDINGS OF THE 38TH CHINESE CONTROL CONFERENCE (CCC), 2019, : 6263 - 6268
  • [37] Leader-following consensus criteria for multi-agent systems with time-varying delays and switching interconnection topologies
    M.J.Park
    O.M.Kwon
    Ju H.Park
    S.M.Lee
    E.J.Cha
    [J]. Chinese Physics B, 2012, 21 (11) : 146 - 155
  • [38] Time-Varying Group Formation Control for Multi-Agent Systems with Second-Order Dynamics and Directed Topologies
    Dong, Xiwang
    Li, Qingdong
    Zhao, Qilun
    Ken, Zhang
    [J]. PROCEEDINGS OF THE 2016 12TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA), 2016, : 350 - 355
  • [39] Consensus tracking control for time-varying delayed linear multi-agent systems under relative state saturation constraints
    Zanganeh, Javad
    Hosseini Sani, Seyed Kamal
    Pariz, Naser
    [J]. TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2023,
  • [40] Time-varying group formation analysis and design for second-order multi-agent systems with directed topologies
    Dong, Xiwang
    Li, Qingdong
    Zhao, Qilun
    Ren, Zhang
    [J]. NEUROCOMPUTING, 2016, 205 : 367 - 374