paper Multi-agent reinforcement learning via distributed MPC as a function approximator

被引：3

作者：

Mallick, Samuel ^{[1
]}

Airaldi, Filippo ^{[1
]}

Dabiri, Azita ^{[1
]}

De Schutter, Bart ^{[1
]}

机构：

[1] Delft Univ Technol, Delft Ctr Syst & Control, Delft, Netherlands

来源：

AUTOMATICA | 2024年 / 167卷

基金：

欧盟地平线“2020”; 欧洲研究理事会;

关键词：

Multi-agent reinforcement learning; Distributed model predictive control; Networked systems; ADMM; MODEL-PREDICTIVE CONTROL;

D O I：

10.1016/j.automatica.2024.111803

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example. (c) 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.

引用

页数：9

共 26 条

[1] Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes [J].

Airaldi, Filippo ;

De Schutter, Bart ;

Dabiri, Azita .

IFAC PAPERSONLINE, 2023, 56 (02) :5759-5764

[2] Dynamic energy scheduling and routing of a large fleet of electric vehicles using multi-agent reinforcement learning [J].

Alqahtani, Mohammed ;

Scott, Michael J. ;

Hu, Mengqi .

COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 169

[3]

[Anonymous], 2011, Found. Trends Mach. Learn., DOI DOI 10.1561/2200000016

[4] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[5]

Borrelli F., 2016, Predictive control for linear and hybrid systems

[6]

Büskens C, 2001, ONLINE OPTIMIZATION OF LARGE SCALE SYSTEMS, P3

[7] A comprehensive survey of multiagent reinforcement learning [J].

Busoniu, Lucian ;

Babuska, Robert ;

De Schutter, Bart .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172

[8] On constrained infinite-time linear quadratic optimal control [J].

Chmielewski, D ;

Manousiouthakis, V .

SYSTEMS & CONTROL LETTERS, 1996, 29 (03) :121-129

[9]

Foerster J, 2017, PR MACH LEARN RES, V70

[10] Learning for MPC with stability & safety guarantees [J].

Gros, Sebastien ;

Zanon, Mario .

AUTOMATICA, 2022, 146

← 1 2 3 →