Mean Field for Markov Decision Processes: From Discrete to Continuous Optimization

被引:34
作者
Gast, Nicolas [1 ]
Gaujal, Bruno [2 ,3 ]
Le Boudec, Jean-Yves [1 ]
机构
[1] EPFL IC LCA2, CH-1015 Lausanne, Switzerland
[2] INRIA Grenoble Rhone Alpes, F-38330 Montbonnot St Martin, France
[3] LIG, F-38330 Montbonnot St Martin, France
关键词
Epidemic model; Hamilton-Jacobi-Bellman (HJB); Markov decision processes; mean field; optimal control; APPROXIMATION; DYNAMICS;
D O I
10.1109/TAC.2012.2186176
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the convergence of Markov decision processes, composed of a large number of objects, to optimization problems on ordinary differential equations. We show that the optimal reward of such a Markov decision process, which satisfies a Bellman equation, converges to the solution of a continuous Hamilton-Jacobi-Bellman (HJB) equation based on the mean field approximation of the Markov decision process. We give bounds on the difference of the rewards and an algorithm for deriving an approximating solution to the Markov decision process from a solution of the HJB equations. We illustrate the method on three examples pertaining, respectively, to investment strategies, population dynamics control and scheduling in queues. They are used to illustrate and justify the construction of the controlled ODE and to show the advantage of solving a continuous HJB equation rather than a large discrete Bellman equation.
引用
收藏
页码:2266 / 2280
页数:15
相关论文
共 25 条
[1]  
[Anonymous], GAMENETS
[2]  
Benaïm M, 1999, LECT NOTES MATH, V1709, P1
[3]  
Benaim M., 2003, DETERMINISTIC UNPUB
[4]   A class of mean field interaction models for computer and communication systems [J].
Benaim, Michel ;
Le Boudec, Jean-Yves .
PERFORMANCE EVALUATION, 2008, 65 (11-12) :823-838
[5]  
Benveniste A., 1990, ADAPTIVE ALGORITHMS
[6]  
BertenB V. G., 2007, LNCS
[7]   Approximate Dynamic Programming using Fluid and Diffusion Approximations with Applications to Power Management [J].
Chen, Wei ;
Huang, Dayu ;
Kulkarni, Ankur A. ;
Unnikrishnan, Jayakrishnan ;
Zhu, Quanyan ;
Mehta, Prashant ;
Meyn, Sean ;
Wierman, Adam .
PROCEEDINGS OF THE 48TH IEEE CONFERENCE ON DECISION AND CONTROL, 2009 HELD JOINTLY WITH THE 2009 28TH CHINESE CONTROL CONFERENCE (CDC/CCC 2009), 2009, :3575-3580
[8]  
Cole R., 2004, PENTAGON REPORTS
[9]   The linear programming approach to approximate dynamic programming [J].
De Farias, DP ;
Van Roy, B .
OPERATIONS RESEARCH, 2003, 51 (06) :850-865
[10]  
Gast Nicolas, 2010, Performance Evaluation Review, V38, P30, DOI 10.1145/1870178.1870189