Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs

被引：0

作者：

Duc Thien Nguyen ^{[1
]}

Yeoh, William ^{[2
]}

Lau, Hoong Chuin ^{[1
]}

Zilberstein, Shlomo ^{[3
]}

Zhang, Chongjie ^{[4
]}

机构：

[1] Singapore Management Univ, Sch Informat Syst, Singapore, Singapore

[2] New Mexico State Univ, Dept Comp Sci, Las Cruces, NM 88003 USA

[3] Univ Massachusetts, Sch Comp Sci, Amherst, MA 01003 USA

[4] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA

来源：

PROCEEDINGS OF THE TWENTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2014年

基金：

新加坡国家研究基金会; 美国国家科学基金会;

关键词：

ALGORITHMS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Researchers have introduced the Dynamic Distributed Constraint Optimization Problem (Dynamic DCOP) formulation to model dynamically changing multi-agent coordination problems, where a dynamic DCOP is a sequence of (static canonical) DCOPs, each partially different from the DCOP preceding it. Existing work typically assumes that the problem in each time step is decoupled from the problems in other time steps, which might not hold in some applications. Therefore, in this paper, we make the following contributions: (i) We introduce a new model, called Markovian Dynamic DCOPs (MD-DCOPs), where the DCOP in the next time step is a function of the value assignments in the current time step; (ii) We introduce two distributed reinforcement learning algorithms, the Distributed RVI Q-learning algorithm and the Distributed R-learning algorithm, that balance exploration and exploitation to solve MD-DCOPs in an online manner; and (iii) We empirically evaluate them against an existing multi arm bandit DCOP algorithm on dynamic DCOPs.

引用

页码：1447 / 1455

页数：9

共 50 条

[1] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Duc Thien Nguyen
Yeoh, William
Hoong Chuin Lau
Zilberstein, Shlomo
Zhang, Chongjie
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
[2] Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming
Koppel, Alec
Bedi, Amrit Singh
Ganguly, Bhargav
Aggarwal, Vaneet
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4545 - 4552
[3] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
[4] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803
[5] Scalable Multi-Agent Reinforcement Learning for Networked Systems with Average Reward
Qu, Guannan
Lin, Yiheng
Wierman, Adam
Li, Na
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[6] Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning
El Mhamdi, El Mandi
Guerraoui, Rachid
Hendrikx, Hadrien
Maurer, Alexandre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[7] Average-Reward Reinforcement Learning with Trust Region Methods
Ma, Xiaoteng
Tang, Xiaohang
Xia, Li
Yang, Jun
Zhao, Qianchuan
PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 2797 - 2803
[8] Tuning Local Search by Average-Reward Reinforcement Learning
Prestwich, Steven
LEARNING AND INTELLIGENT OPTIMIZATION, 2008, 5313 : 192 - 205
[9] Multi-Agent Reinforcement Learning with Reward Delays
Zhang, Yuyang
Zhang, Runyu
Gu, Yuantao
Li, Na
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
[10] Decentralized Deterministic Multi-Agent Reinforcement Learning
Grosnit, Antoine
Cai, Desmond
Wynter, Laura
2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2021, : 1548 - 1553

← 1 2 3 4 5 →