Learning Optimal Policies in Mean Field Models with Kullback-Leibler Regularization

被引:1
作者
Busic, Ana [1 ,2 ]
Meyn, Sean [3 ,4 ]
Cammardella, Neil [5 ]
机构
[1] PSL Res Univ, CNRS, Inria, Ecole Normale Super, Paris, France
[2] PSL Res Univ, CNRS, DI ENS, Ecole Normale Super, Paris, France
[3] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA
[4] Inria Int Chair, Paris, France
[5] Emera Technol LLC, Applicat Engn, Halifax, NS, Canada
来源
2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC | 2023年
基金
美国国家科学基金会;
关键词
MARKOV DECISION-PROCESSES; STOCHASTIC-APPROXIMATION; LOADS;
D O I
10.1109/CDC49753.2023.10383868
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The theory and application of mean field games has grown significantly since its origins less than two decades ago. This paper considers a special class in which the game is cooperative, and the cost includes a control penalty defined by Kullback-Leibler divergence, as commonly used in reinforcement learning and other fields. Its use as a control cost or regularizer is often preferred because this leads to an attractive solution. This paper considers a particular control paradigm called Kullback-Leibler Quadratic (KLQ) optimal control, and arrives at the following conclusions: 1. in application to distributed control of electric loads, a new modeling technique is introduced to obtain a simple Markov model for each load (the 'agent' in mean field theory). 2. It is argued that the optimality equations may be solved using Monte-Carlo techniques-a specialized version of stochastic gradient descent (SGD). 3. The use of averaging minimizes the asymptotic covariance in the SGD algorithm; the form of the optimal covariance is identified for the first time.
引用
收藏
页码:38 / 45
页数:8
相关论文
共 30 条
[1]  
Almassalkhi M, 2017, P AMER CONTR CONF, P1431, DOI 10.23919/ACC.2017.7963154
[2]  
Bas-Serrano J., 2021, INT C ARTIFICIAL INT, V130, P3610
[3]  
Benenati E, 2019, IEEE DECIS CONTR P, P4189, DOI 10.1109/CDC40024.2019.9030029
[4]  
Benveniste A., 2012, Adaptive Algorithms and Stochastic Approximations, V22
[5]  
Borkar V. S., 2021, Stochastic Approximation: A Dynamical Systems Viewpoint, V2nd
[6]  
Borkar V, 2024, Arxiv, DOI arXiv:2110.14427
[7]  
Busic A, 2019, IEEE DECIS CONTR P, P7258, DOI 10.1109/CDC40024.2019.9029603
[8]   ORDINARY DIFFERENTIAL EQUATION METHODS FOR MARKOV DECISION PROCESSES AND APPLICATION TO KULLBACK-LEIBLER CONTROL COST [J].
Busic, Ana ;
Meyn, Sean .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2018, 56 (01) :343-366
[9]  
Caines PE., 2021, Encyclopedia of Systems and Control, P1197, DOI [10.1007/978-3-030-44184-530, DOI 10.1007/978-3-030-44184-530]
[10]  
Cammardella N, 2021, Arxiv, DOI arXiv:2004.01798