Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引:0
|
作者
Stankovic, Milos S. [1 ]
Stankovic, Srdjan S. [2 ,3 ]
机构
[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia
[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia
[3] Vlatacom Inst, Belgrade, Serbia
来源
2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年
关键词
STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.
引用
收藏
页码:167 / 172
页数:6
相关论文
共 50 条
  • [1] On Uniform Consensus of Linear Multi-Agent Systems with Time-Varying Graph Topologies
    Cai Ning
    Liu Minghua
    Wei Xiaojuan
    Ma Haiying
    2013 32ND CHINESE CONTROL CONFERENCE (CCC), 2013, : 6896 - 6899
  • [2] Distributed multi-agent temporal-difference learning with full neighbor information
    Peng, Zhinan
    Hu, Jiangping
    Luo, Rui
    Ghosh, Bijoy K.
    CONTROL THEORY AND TECHNOLOGY, 2020, 18 (04) : 379 - 389
  • [3] Distributed consensus-based multi-agent temporal-difference learning
    Stankovic, Milos S.
    Beko, Marko
    Stankovic, Srdjan S.
    AUTOMATICA, 2023, 151
  • [4] Multi-agent consensus with time-varying delays and switching topologies
    Wei, Jia
    Fang, Huajing
    JOURNAL OF SYSTEMS ENGINEERING AND ELECTRONICS, 2014, 25 (03) : 489 - 495
  • [5] Multi-agent consensus with time-varying delays and switching topologies
    Jia Wei
    Huajing Fang
    Journal of Systems Engineering and Electronics, 2014, 25 (03) : 489 - 495
  • [6] Time-varying group formation control for general linear multi-agent systems with directed topologies
    Dong, Xiwang
    Li, Qingdong
    Zhao, Qilun
    Ren, Zhang
    PROCEEDINGS OF THE 35TH CHINESE CONTROL CONFERENCE 2016, 2016, : 7733 - 7738
  • [7] Containment Control of Continuous-time Multi-agent Systems with General Linear Dynamics under Time-varying Communication Topologies
    Yang, Zhe
    Mu, Xiao-wu
    Liu, Kai
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2017, 15 (01) : 442 - 449
  • [8] Consensus of piecewise time-varying multi-agent systems with switching topologies
    Sun, Jian
    Guo, Chen
    Liu, Lei
    TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2022, 44 (13) : 2522 - 2531
  • [9] Time-varying group formation-tracking control for general linear multi-agent systems with switching topologies and time-varying delays
    Zhou, Shiyu
    Dong, Xiwang
    Tan, Qingke
    Wang, Qing
    Ren, Zhang
    2021 22ND IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2021, : 105 - 110
  • [10] Distributed constrained optimization for multi-agent networks with communication delays under time-varying topologies
    An, Yuanyuan
    Wang, Aiping
    Zhang, Xufeng
    Xiao, Feng
    SYSTEMS & CONTROL LETTERS, 2024, 185