Approximate linear programming for decentralized policy iteration in cooperative multi-agent Markov decision processes

被引：0

作者：

Mandal, Lakshmi ^{[1
]}

Lakshminarayanan, Chandrashekar ^{[2
]}

Bhatnagar, Shalabh ^{[1
]}

机构：

[1] Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, India

[2] Indian Inst Technol Madras, Dept Comp Sci & Engn, Chennai 600036, India

来源：

SYSTEMS & CONTROL LETTERS | 2025年 / 196卷

关键词：

Markov decision process; Cooperative multi-agent systems; Approximate linear programming; Policy iteration;

D O I：

10.1016/j.sysconle.2024.106003

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this work, we consider a 'cooperative' multi-agent Markov decision process (MDP) involving m ( > 1) agents. At each decision epoch, all the m agents independently select actions in order to minimize a common long-term cost objective. In the policy iteration process of multi-agent setup, the number of actions grows exponentially with the number of agents, incurring huge computational costs. Thus, recent works consider decentralized policy improvement, where each agent improves its decisions unilaterally, assuming that the decisions of the other agents are fixed. However, exact value functions are considered in the literature, which is computationally expensive for a large number of agents with high dimensional state-action space. Thus, we propose approximate decentralized policy iteration algorithms, using approximate linear programming with function approximation to compute the approximate value function for decentralized policy improvement. Further, we consider (both) cooperative multi-agent finite and infinite horizon discounted MDPs and propose suitable algorithms in each case. Moreover, we provide theoretical guarantees for our algorithms and also demonstrate their advantages over existing state-of-the-art algorithms in the literature.

引用

页数：9

共 23 条

[1]

Abbasi-Yadkori Y, 2014, PR MACH LEARN RES, V32, P496

[2]

[Anonymous], 2007, Approximate dynamic programming: Solving the curses of dimensionality

[3]

Bertsekas D. P., 1995, Dynamic programming and optimal control, V1-2

[4]

Bertsekas D, 2020, Arxiv, DOI arXiv:1910.00120

[5] Multiagent value iteration algorithms in dynamic programming and reinforcement learning [J].

Bertsekas, Dimitri .

RESULTS IN CONTROL AND OPTIMIZATION, 2020, 1

[6] Multiagent Reinforcement Learning: Rollout and Policy Iteration [J].

Bertsekas, Dimitri .

IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2021, 8 (02) :249-272

[7]

Bowling M., 2000, Technical report

[8]

Campbell T, 2013, P AMER CONTR CONF, P2356

[9]

Chen Ziyi, 2022, P MACHINE LEARNING R

[10] Cooperative Markov Decision Process for Voltage Control of Grid-Tied Photovoltaic Generators With Reactive Power Support [J].

Correa, Henrique Pires ;

Teles Vieira, Flavio Henrique .

IEEE TRANSACTIONS ON SUSTAINABLE ENERGY, 2022, 13 (02) :919-933

← 1 2 3 →