Distributed multi-agent temporal-difference learning with full neighbor information

被引：2

作者：

Peng, Zhinan ^{[1
]}

Hu, Jiangping ^{[1
]}

Luo, Rui ^{[1
]}

Ghosh, Bijoy K. ^{[1
,2
]}

机构：

[1] Univ Elect Sci & Technol China, Sch Automat Engn, Chengdu 611731, Sichuan, Peoples R China

[2] Texas Tech Univ, Dept Math & Stat, Lubbock, TX 79409 USA

来源：

CONTROL THEORY AND TECHNOLOGY | 2020年 / 18卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Distributed algorithm; Reinforcement learning; Temporal-difference learning; Multi-agent systems; STOCHASTIC-APPROXIMATION; TRACKING CONTROL; SYSTEMS; DYNAMICS;

D O I：

10.1007/s11768-020-00016-w

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel distributed multi-agent temporal-difference learning framework for value function approximation, which allows agents using all the neighbor information instead of the information from only one neighbor. With full neighbor information, the proposed framework (1) has a faster convergence rate, and (2) is more robust compared to the state-of-the-art approaches. Then we propose a distributed multi-agent discounted temporal difference algorithm and a distributed multi-agent average cost temporal difference learning algorithm based on the framework. Moreover, the two proposed algorithms' theoretical convergence proofs are provided. Numerical simulation results show that our proposed algorithms are superior to the gossip-based algorithm in convergence speed, robustness to noise and time-varying network topology.

引用

页码：379 / 389

页数：11

共 36 条

[1]

[Anonymous], 2002, IFAC P

[2]

Baird L., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P30

[3]

Bertsekas DP, 1995, PROCEEDINGS OF THE 34TH IEEE CONFERENCE ON DECISION AND CONTROL, VOLS 1-4, P560, DOI 10.1109/CDC.1995.478953

[4] Stochastic approximation with two time scales [J].

Borkar, VS .

SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294

[5] Technical update: Least-squares temporal difference learning [J].

Boyan, JA .

MACHINE LEARNING, 2002, 49 (2-3) :233-246

[6]

Boyd Stephen, 2004, Convex Optimization, DOI [10.1017/CBO9780511804441, DOI 10.1017/CBO9780511804441]

[7] Dynamic Correlation Matrix based Multi-Q Learning for a Multi-Robot System [J].

Guo, Hongliang ;

Meng, Yan .

2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, :840-845

[8]

Hegselmann R, 2002, JASSS-J ARTIF SOC S, V5

[9] Adaptive tracking control of leader-follower systems with unknown dynamics and partial measurements [J].

Hu, Jiangping ;

Zheng, Wei Xing .

AUTOMATICA, 2014, 50 (05) :1416-1423

[10] Distributed tracking control of leader-follower multi-agent systems under noisy measurement [J].

Hu, Jiangping ;

Feng, Gang .

AUTOMATICA, 2010, 46 (08) :1382-1387

← 1 2 3 4 →