MEAN-FIELD MARKOV DECISION PROCESSES WITH COMMON NOISE AND OPEN-LOOP CONTROLS

被引：14

作者：

Motte, Mederic ^{[1
]}

Huyen Pham ^{[1
]}

机构：

[1] Univ Paris, LPSM, Paris, France

来源：

ANNALS OF APPLIED PROBABILITY | 2022年 / 32卷 / 02期

关键词：

Mean-field; Markov decision process; conditional propagation of chaos; measurable coupling; randomized control; CONVERGENCE;

D O I：

10.1214/21-AAP1713

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

We develop an exhaustive study of Markov decision process (MDP) under mean field interaction both on states and actions in the presence of common noise, and when optimization is performed over open-loop controls on infinite horizon. Such model, called CMKV-MDP for conditional McKean- Vlasov MDP, arises and is obtained here rigorously with a rate of convergence as the asymptotic problem of N-cooperative agents controlled by a social planner/influencer that observes the environment noises but not necessarily the individual states of the agents. We highlight the crucial role of relaxed controls and randomization hypothesis for this class of models with respect to classical MDP theory. We prove the correspondence between CMKV-MDP and a general lifted MDP on the space of probability measures, and establish the dynamic programming Bellman fixed point equation satisfied by the value function, as well as the existence of e -optimal randomized feedback controls. The arguments of proof involve an original measurable optimal coupling for the Wasserstein distance. This provides a procedure for learning strategies in a large population of interacting collaborative agents.

引用

页码：1421 / 1458

页数：38

共 39 条

[21] On mean field games with common noise and McKean-Vlasov SPDEs
Kolokoltsov, Vassili N.
Troeva, Marianna
STOCHASTIC ANALYSIS AND APPLICATIONS, 2019, 37 (04) : 522 - 549
[22] An optimistic value iteration for mean-variance optimization in discounted Markov decision processes
Ma, Shuai
Ma, Xiaoteng
Xia, Li
RESULTS IN CONTROL AND OPTIMIZATION, 2022, 8
[23] A unified algorithm framework for mean-variance optimization in discounted Markov decision processes
Ma, Shuai
Ma, Xiaoteng
Xia, Li
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2023, 311 (03) : 1057 - 1067
[24] Propagation of Chaos for Point Processes Induced by Particle Systems with Mean-Field Drift Interaction
Kolliopoulos, Nikolaos
Larsson, Martin
Zhang, Zeyu
JOURNAL OF THEORETICAL PROBABILITY, 2025, 38 (01)
[25] A FAST ITERATIVE PDE-BASED ALGORITHM FOR FEEDBACK CONTROLS OF NONSMOOTH MEAN-FIELD CONTROL PROBLEMS
Reisinger, Christoph
Stockinger, Wolfgang
Zhang, Yufei
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2024, 46 (04) : A2737 - A2773
[26] Selection by vanishing common noise for potential finite state mean field games
Cecchin, Alekos
Delarue, Francois
COMMUNICATIONS IN PARTIAL DIFFERENTIAL EQUATIONS, 2022, 47 (01) : 89 - 168
[27] Modeling the Decision and Coordination Mechanism of Power Battery Closed-Loop Supply Chain Using Markov Decision Processes
Zhang, Huanyong
Li, Ningshu
Lin, Jinghan
SUSTAINABILITY, 2024, 16 (11)
[28] Characterization of Mean-Field Type H_ Index for Continuous-Time Stochastic Systems with Markov Jump
Ma, Limin
Song, Caixia
Zhang, Weihai
Liu, Zhenbin
PROCESSES, 2022, 10 (08)
[29] Maximum principle for mean-field optimal control problems of delayed stochastic systems involving continuous and impulse controls
Zhang, Qixia
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 1672 - 1678
[30] Heuristic mean-variance optimization in Markov decision processes using state-dependent risk aversion
Schlosser, Rainer
IMA JOURNAL OF MANAGEMENT MATHEMATICS, 2022, 33 (02) : 181 - 199

← 1 2 3 4 →