Finite-horizon variance penalised Markov decision processes

被引:7
作者
Collins E.J. [1 ]
机构
[1] Department of Mathematics, University of Bristol
关键词
Convex polytopes; Markov decision processes; Mean-variance tradeoff; Variance penalty;
D O I
10.1007/BF01539805
中图分类号
学科分类号
摘要
We consider a finite horizon Markov decision process with only terminal rewards. We describe a finite algorithm for computing a Markov deterministic policy which maximises the variance penalised reward and we outline a vertex elimination algorithm which can reduce the computation involved. © Springer-Verlag 1997.
引用
收藏
页码:35 / 39
页数:4
相关论文
共 8 条
[1]  
Collins E.J., McNamara J.M., Finite-horizon Dynamic Optimisation When the Terminal Reward Is a Concave Functional of the Distribution of the Final State, (1985)
[2]  
Derman C., Finite State Markovian Decision Processes, (1970)
[3]  
Filar J.A., Kallenberg L.C.M., Lee H.M., Variance-penalised markov decision processes, Math Oper Res, 14, pp. 147-161, (1989)
[4]  
Huang Y., Kallenberg L.C.M., On finding optimal policies for markov decision chains: A unifying framework for mean-variance-tradeoffs, Math Oper Res, 19, pp. 434-448, (1994)
[5]  
McMullen P., Shephard G.C., Convex Polytopes and the Upper Bound Conjecture, 3, (1971)
[6]  
Sobel M.J., The variance of a discounted markov decision process, J Appl Probab, 19, pp. 794-802, (1982)
[7]  
White D.J., Computational approaches to variance penalised markov decision processes, OR Spektrum, 14, pp. 79-83, (1992)
[8]  
White D.J., A mathematical programming approach to a problem in variance penalised markov decision processes, OR Spektrum, 15, pp. 225-230, (1993)