Multi-Agent Temporal-Difference Learning with Linear Function Approximation: Weak Convergence under Time-Varying Network Topologies

被引：0

作者：

Stankovic, Milos S. ^{[1
]}

Stankovic, Srdjan S. ^{[2
,3
]}

机构：

[1] Univ Belgrade, Innovat Ctr, Sch Elect Engn, Belgrade, Serbia

[2] Univ Belgrade, Sch Elect Engn, Belgrade, Serbia

[3] Vlatacom Inst, Belgrade, Serbia

来源：

2016 AMERICAN CONTROL CONFERENCE (ACC) | 2016年

关键词：

STOCHASTIC-APPROXIMATION; CONSENSUS; OPTIMIZATION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper we propose two novel distributed algorithms for iterative multi-agent off-policy linear value function approximation in Markov decision processes. The algorithms do not require any fusion center and are based on incorporating consensus-based collaborations between the agents over time-varying communication networks into recently proposed single-agent algorithms. The resulting distributed algorithms allow the agents to have different behavior policies while evaluating the response to a single target policy, using the same linear parametrization of the value function. Under appropriate assumptions on the time-varying network topology and the overall state-visiting distributions of the agents we prove for both algorithms weak convergence of the parameter estimates to a consensus point determined by an associated ODE. By a proper design of the network parameters and/or topology, this point can be tuned to coincide with the globally optimal point. The properties and the effectiveness of the proposed algorithms are illustrated on an example.

引用

页码：167 / 172

页数：6

共 50 条

[41] Consensus tracking control for time-varying delayed linear multi-agent systems under relative state saturation constraints [J].

Zanganeh, Javad ;

Hosseini Sani, Seyed Kamal ;

Pariz, Naser .

TRANSACTIONS OF THE INSTITUTE OF MEASUREMENT AND CONTROL, 2023,

[42] Time-varying group formation analysis and design for second-order multi-agent systems with directed topologies [J].

Dong, Xiwang ;

Li, Qingdong ;

Zhao, Qilun ;

Ren, Zhang .

NEUROCOMPUTING, 2016, 205 :367-374

[43] Leader-following consensus criteria for multi-agent systems with time-varying delays and switching interconnection topologies [J].

Park, M. J. ;

Kwon, O. M. ;

Park, Ju H. ;

Lee, S. M. ;

Cha, E. J. .

CHINESE PHYSICS B, 2012, 21 (11)

[44] Distributed Multi-Agent Gradient Based Q-Learning with Linear Function Approximation [J].

Stankovic, Milog S. ;

Beko, Marko ;

Stankovic, Srdjan S. .

2024 EUROPEAN CONTROL CONFERENCE, ECC 2024, 2024, :2500-2505

[45] Time-varying output formation tracking of linear disturbed multi-agent systems by dynamic protocols [J].

Silva, Bruno Martins Calazans ;

Ishihara, Joao Yoshiyuki ;

Tognetti, Eduardo Stockler .

AUTOMATICA, 2025, 179

[46] Adaptive Event-Triggering Consensus for Multi-Agent Systems with Linear Time-Varying Dynamics [J].

Zhang Wenbing ;

Abuzar, Hussein Mohammed Atitalla ;

Bao Jiatong ;

Liu Yurong .

JOURNAL OF SYSTEMS SCIENCE & COMPLEXITY, 2022, 35 (05) :1700-1718

[47] Adaptive Event-Triggering Consensus for Multi-Agent Systems with Linear Time-Varying Dynamics [J].

Wenbing Zhang ;

Atitalla Abuzar Hussein Mohammed ;

Jiatong Bao ;

Yurong Liu .

Journal of Systems Science and Complexity, 2022, 35 :1700-1718

[48] Distributed Fault Detection for Linear Time-Varying Multi-Agent Systems With Relative Output Information [J].

Zou, Peilu ;

Wang, Ping ;

Yu, Chengpu .

IEEE ACCESS, 2021, 9 (09) :42933-42946

[49] Time-varying output formation tracking of heterogeneous linear multi-agent systems with dynamical controllers [J].

Liu, Congying ;

Wu, Xiaoqun ;

Wan, Xiaoxiao ;

Lu, Jinhu .

NEUROCOMPUTING, 2021, 441 :36-43

[50] Finite-time consensus iterative learning control of discrete time-varying multi-agent systems [J].

Cao W. ;

Sun M. .

Kongzhi yu Juece/Control and Decision, 2019, 34 (04) :891-896

← 1 2 3 4 5 →