Learning Optimal Policies in Mean Field Models with Kullback-Leibler Regularization

被引：1

作者：

Busic, Ana ^{[1
,2
]}

Meyn, Sean ^{[3
,4
]}

Cammardella, Neil ^{[5
]}

机构：

[1] PSL Res Univ, CNRS, Inria, Ecole Normale Super, Paris, France

[2] PSL Res Univ, CNRS, DI ENS, Ecole Normale Super, Paris, France

[3] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

[4] Inria Int Chair, Paris, France

[5] Emera Technol LLC, Applicat Engn, Halifax, NS, Canada

来源：

2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年

基金：

美国国家科学基金会;

关键词：

MARKOV DECISION-PROCESSES; STOCHASTIC-APPROXIMATION; LOADS;

D O I：

10.1109/CDC49753.2023.10383868

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The theory and application of mean field games has grown significantly since its origins less than two decades ago. This paper considers a special class in which the game is cooperative, and the cost includes a control penalty defined by Kullback-Leibler divergence, as commonly used in reinforcement learning and other fields. Its use as a control cost or regularizer is often preferred because this leads to an attractive solution. This paper considers a particular control paradigm called Kullback-Leibler Quadratic (KLQ) optimal control, and arrives at the following conclusions: 1. in application to distributed control of electric loads, a new modeling technique is introduced to obtain a simple Markov model for each load (the 'agent' in mean field theory). 2. It is argued that the optimality equations may be solved using Monte-Carlo techniques-a specialized version of stochastic gradient descent (SGD). 3. The use of averaging minimizes the asymptotic covariance in the SGD algorithm; the form of the optimal covariance is identified for the first time.

引用

页码：38 / 45

页数：8

共 30 条

[1]

Almassalkhi M, 2017, P AMER CONTR CONF, P1431, DOI 10.23919/ACC.2017.7963154

[2]

Bas-Serrano J., 2021, INT C ARTIFICIAL INT, V130, P3610

[3]

Benenati E, 2019, IEEE DECIS CONTR P, P4189, DOI 10.1109/CDC40024.2019.9030029

[4]

Benveniste A., 2012, Adaptive Algorithms and Stochastic Approximations, V22

[5]

Borkar V. S., 2021, Stochastic Approximation: A Dynamical Systems Viewpoint, V2nd

[6]

Borkar V, 2024, Arxiv, DOI arXiv:2110.14427

[7]

Busic A, 2019, IEEE DECIS CONTR P, P7258, DOI 10.1109/CDC40024.2019.9029603

[8] ORDINARY DIFFERENTIAL EQUATION METHODS FOR MARKOV DECISION PROCESSES AND APPLICATION TO KULLBACK-LEIBLER CONTROL COST [J].

Busic, Ana ;

Meyn, Sean .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (01) :343-366

[9]

Caines PE., 2021, Encyclopedia of Systems and Control, P1197, DOI [10.1007/978-3-030-44184-530, DOI 10.1007/978-3-030-44184-530]

[10]

Cammardella N, 2021, Arxiv, DOI arXiv:2004.01798

← 1 2 3 →