paper Multi-agent reinforcement learning via distributed MPC as a function approximator

被引:2
作者
Mallick, Samuel [1 ]
Airaldi, Filippo [1 ]
Dabiri, Azita [1 ]
De Schutter, Bart [1 ]
机构
[1] Delft Univ Technol, Delft Ctr Syst & Control, Delft, Netherlands
基金
欧洲研究理事会; 欧盟地平线“2020”;
关键词
Multi-agent reinforcement learning; Distributed model predictive control; Networked systems; ADMM; MODEL-PREDICTIVE CONTROL;
D O I
10.1016/j.automatica.2024.111803
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents a novel approach to multi-agent reinforcement learning (RL) for linear systems with convex polytopic constraints. Existing work on RL has demonstrated the use of model predictive control (MPC) as a function approximator for the policy and value functions. The current paper is the first work to extend this idea to the multi-agent setting. We propose the use of a distributed MPC scheme as a function approximator, with a structure allowing for distributed learning and deployment. We then show that Q-learning updates can be performed distributively without introducing nonstationarity, by reconstructing a centralized learning update. The effectiveness of the approach is demonstrated on a numerical example. (c) 2024 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
引用
收藏
页数:9
相关论文
共 26 条
[1]   Learning safety in model-based Reinforcement Learning using MPC and Gaussian Processes [J].
Airaldi, Filippo ;
De Schutter, Bart ;
Dabiri, Azita .
IFAC PAPERSONLINE, 2023, 56 (02) :5759-5764
[2]   Dynamic energy scheduling and routing of a large fleet of electric vehicles using multi-agent reinforcement learning [J].
Alqahtani, Mohammed ;
Scott, Michael J. ;
Hu, Mengqi .
COMPUTERS & INDUSTRIAL ENGINEERING, 2022, 169
[3]   Deep Reinforcement Learning A brief survey [J].
Arulkumaran, Kai ;
Deisenroth, Marc Peter ;
Brundage, Miles ;
Bharath, Anil Anthony .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38
[4]  
Borrelli F., 2016, Predictive control for linear and hybrid systems
[5]   Distributed optimization and statistical learning via the alternating direction method of multipliers [J].
Boyd S. ;
Parikh N. ;
Chu E. ;
Peleato B. ;
Eckstein J. .
Foundations and Trends in Machine Learning, 2010, 3 (01) :1-122
[6]  
Büskens C, 2001, ONLINE OPTIMIZATION OF LARGE SCALE SYSTEMS, P3
[7]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[8]   On constrained infinite-time linear quadratic optimal control [J].
Chmielewski, D ;
Manousiouthakis, V .
SYSTEMS & CONTROL LETTERS, 1996, 29 (03) :121-129
[9]  
Foerster J, 2017, PR MACH LEARN RES, V70
[10]   Learning for MPC with stability & safety guarantees [J].
Gros, Sebastien ;
Zanon, Mario .
AUTOMATICA, 2022, 146