Finite-horizon variance penalised Markov decision processes

被引：7

作者：

Collins E.J. ^{[1
]}

机构：

[1] Department of Mathematics, University of Bristol

来源：

Operations-Research-Spektrum | 1997年 / 19卷 / 1期

关键词：

Convex polytopes; Markov decision processes; Mean-variance tradeoff; Variance penalty;

D O I：

10.1007/BF01539805

中图分类号：

学科分类号：

摘要：

We consider a finite horizon Markov decision process with only terminal rewards. We describe a finite algorithm for computing a Markov deterministic policy which maximises the variance penalised reward and we outline a vertex elimination algorithm which can reduce the computation involved. © Springer-Verlag 1997.

引用

页码：35 / 39

页数：4

共 8 条

[1]

Collins E.J., McNamara J.M., Finite-horizon Dynamic Optimisation When the Terminal Reward Is a Concave Functional of the Distribution of the Final State, (1985)

[2]

Derman C., Finite State Markovian Decision Processes, (1970)

[3]

Filar J.A., Kallenberg L.C.M., Lee H.M., Variance-penalised markov decision processes, Math Oper Res, 14, pp. 147-161, (1989)

[4]

Huang Y., Kallenberg L.C.M., On finding optimal policies for markov decision chains: A unifying framework for mean-variance-tradeoffs, Math Oper Res, 19, pp. 434-448, (1994)

[5]

McMullen P., Shephard G.C., Convex Polytopes and the Upper Bound Conjecture, 3, (1971)

[6]

Sobel M.J., The variance of a discounted markov decision process, J Appl Probab, 19, pp. 794-802, (1982)

[7]

White D.J., Computational approaches to variance penalised markov decision processes, OR Spektrum, 14, pp. 79-83, (1992)

[8]

White D.J., A mathematical programming approach to a problem in variance penalised markov decision processes, OR Spektrum, 15, pp. 225-230, (1993)

← 1 →