Max-Plus Linear Approximations for Deterministic Continuous-State Markov Decision Processes

被引:5
作者
Berthier, Eloise [1 ,2 ]
Bach, Francis [1 ,2 ]
机构
[1] PSL Res Univ, INRIA, F-75012 Paris, France
[2] PSL Res Univ, Ecole Normale Super, Dept Informat, F-75012 Paris, France
来源
IEEE CONTROL SYSTEMS LETTERS | 2020年 / 4卷 / 03期
关键词
Dictionaries; Heuristic algorithms; Aerospace electronics; Markov processes; Approximation algorithms; Linear approximation; dynamic programming;
D O I
10.1109/LCSYS.2020.2973199
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We consider deterministic continuous-state Markov decision processes (MDPs). We apply a max-plus linear method to approximate the value function with a specific dictionary of functions that leads to an adequate state-discretization of the MDP. This is more efficient than a direct discretization of the state space, typically intractable in high dimension. We propose a simple strategy to adapt the discretization to a problem instance, thus mitigating the curse of dimensionality. We provide numerical examples showing that the method works well on simple MDPs.
引用
收藏
页码:767 / 772
页数:6
相关论文
共 17 条
[1]   The max-plus finite element method for solving deterministic optimal control problems: Basic properties and convergence analysis [J].
Akian, Marianne ;
Gaubert, Stephane ;
Lakhoua, Asma .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2008, 47 (02) :817-848
[2]  
[Anonymous], 2011, Calculus of variations and optimal control theory: A concise introduction
[3]  
[Anonymous], WORKING PAPER
[4]  
[Anonymous], 2016, OPENAI GYM
[5]  
[Anonymous], P 21 ANN C LEARN THE
[6]   Duality and separation theorems in idempotent semimodules [J].
Cohen, G ;
Gaubert, S ;
Quadrat, JP .
LINEAR ALGEBRA AND ITS APPLICATIONS, 2004, 379 :395-422
[7]  
Finkel R. A., 1974, Acta Informatica, V4, P1, DOI 10.1007/BF00288933
[8]  
Fleming W.H., 2006, CONTROLLED MARKOV PR, V2, DOI [10.1007/0-387-31071-1, DOI 10.1007/0-387-31071-1]
[9]  
Gaubert S, 1997, LECT NOTES COMPUT SC, V1200, P261, DOI 10.1007/BFb0023465
[10]  
Gaubert S, 2011, IEEE DECIS CONTR P, P1054, DOI 10.1109/CDC.2011.6161386