共 8 条
[1]
Collins E.J., McNamara J.M., Finite-horizon Dynamic Optimisation When the Terminal Reward Is a Concave Functional of the Distribution of the Final State, (1985)
[2]
Derman C., Finite State Markovian Decision Processes, (1970)
[3]
Filar J.A., Kallenberg L.C.M., Lee H.M., Variance-penalised markov decision processes, Math Oper Res, 14, pp. 147-161, (1989)
[4]
Huang Y., Kallenberg L.C.M., On finding optimal policies for markov decision chains: A unifying framework for mean-variance-tradeoffs, Math Oper Res, 19, pp. 434-448, (1994)
[5]
McMullen P., Shephard G.C., Convex Polytopes and the Upper Bound Conjecture, 3, (1971)
[6]
Sobel M.J., The variance of a discounted markov decision process, J Appl Probab, 19, pp. 794-802, (1982)
[7]
White D.J., Computational approaches to variance penalised markov decision processes, OR Spektrum, 14, pp. 79-83, (1992)
[8]
White D.J., A mathematical programming approach to a problem in variance penalised markov decision processes, OR Spektrum, 15, pp. 225-230, (1993)