Unified reinforcement Q-learning for mean field game and control problems

被引：30

作者：

Angiuli, Andrea ^{[1
]}

Fouque, Jean-Pierre ^{[1
]}

Lauriere, Mathieu ^{[2
]}

机构：

[1] Univ Calif Santa Barbara, Dept Stat & Appl Probabil, South Hall 5504, Santa Barbara, CA 93106 USA

[2] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08544 USA

来源：

MATHEMATICS OF CONTROL SIGNALS AND SYSTEMS | 2022年 / 34卷 / 02期

关键词：

Q-learning; Mean field game; Mean field control; Timescales; Linear-quadratic control;

D O I：

10.1007/s00498-021-00310-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a Reinforcement Learning (RL) algorithm to solve infinite horizon asymptotic Mean Field Game (MFG) and Mean Field Control (MFC) problems. Our approach can be described as a unified two-timescale Mean Field Q-learning: The same algorithm can learn either the MFG or the MFC solution by simply tuning the ratio of two learning parameters. The algorithm is in discrete time and space where the agent not only provides an action to the environment but also a distribution of the state in order to take into account the mean field feature of the problem. Importantly, we assume that the agent cannot observe the population's distribution and needs to estimate it in a model-free manner. The asymptotic MFG and MFC problems are also presented in continuous time and space, and compared with classical (non-asymptotic or stationary) MFG and MFC problems. They lead to explicit solutions in the linear-quadratic (LQ) case that are used as benchmarks for the results of our algorithm.

引用

页码：217 / 271

页数：55

共 29 条

[1]

ANAHTARCI B., 2020, ARXIV200312151

[2]

Bellman R. E., 2015, Applied dynamic programming

[3]

Bensoussan A, 2013, SPRINGERBRIEF MATH, P1, DOI 10.1007/978-1-4614-8508-7_1

[4]

Borkar V. S., 2008, Stochastic Approximation: A Dynamical Systems Viewpoint

[5] Stochastic approximation with two time scales [J].

Borkar, VS .

SYSTEMS & CONTROL LETTERS, 1997, 29 (05) :291-294

[6] LEARNING IN MEAN FIELD GAMES: THE FICTITIOUS PLAY [J].

Cardaliaguet, Pierre ;

Hadikhanloo, Saeed .

ESAIM-CONTROL OPTIMISATION AND CALCULUS OF VARIATIONS, 2017, 23 (02) :569-591

[7]

Carmona R, 2018, PROB THEOR STOCH MOD, V84, P1, DOI 10.1007/978-3-319-56436-4

[8]

Carmona R., 2019, PREPRINT

[9]

Elie R., 2020, P AAAI

[10]

Even-Dar E, 2003, J MACH LEARN RES, V5, P1

← 1 2 3 →