ORDINARY DIFFERENTIAL EQUATION METHODS FOR MARKOV DECISION PROCESSES AND APPLICATION TO KULLBACK-LEIBLER CONTROL COST

被引：10

作者：

Busic, Ana ^{[1
,2
]}

Meyn, Sean ^{[3
]}

机构：

[1] PSL Res Univ, CNRS, Inria Paris, Ecole Normale Super, Paris, France

[2] PSL Res Univ, CNRS, DI ENS, Ecole Normale Super, Paris, France

[3] Univ Florida, Dept Elect & Comp Engn, Gainesville, FL 32611 USA

来源：

SIAM JOURNAL ON CONTROL AND OPTIMIZATION | 2018年 / 56卷 / 01期

基金：

美国国家科学基金会;

关键词：

Markov decision processes; computational methods; distributed control; SPECTRAL THEORY;

D O I：

10.1137/16M1100204

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

A new approach to computation of optimal policies for MDP (Markov decision process) models is introduced. The main idea is to solve not one, but an entire family of MDPs, parameterized by a scalar zeta that appears in the one-step reward function. For an MDP with d states, the family of relative value functions {h*(zeta) : zeta is an element of R} is the solution to an ODE, d/d zeta h*(zeta) = V(h*(zeta)) where the vector field V: R-d -> R-d has a simple form, based on a matrix inverse. Two general applications are presented: Brockett's quadratic-cost MDP model, and a generalization of the "linearly solvable" MDP framework of Todorov in which the one-step reward function is defined by Kullback-Leibler divergence.

引用

页码：343 / 366

页数：24

共 31 条

[1]

[Anonymous], 2014, Markov decision processes: discrete stochastic dynamic programming

[2]

[Anonymous], 1996, Neuro-dynamic programming

[3]

[Anonymous], 2007, Advances in neural information processing systems

[4]

[Anonymous], 2015, Reinforcement Learning: An Introduction

[5]

[Anonymous], 2009, MARKOV CHAINS STOCHA

[6]

[Anonymous], 2002, Internat. Ser. Oper. Res. Management Sci.

[7] Spectral Decomposition of Demand-Side Flexibility for Reliable Ancillary Services in a Smart Grid [J].

Barooah, Prabir ;

Busic, Ana ;

Meyn, Sean .

2015 48TH HAWAII INTERNATIONAL CONFERENCE ON SYSTEM SCIENCES (HICSS), 2015, :2700-2709

[8]

BERTSEKAS D. P., 1996, Stochastic optimal control: the discrete-time case

[9]

Boyd L., 2004, CONVEX OPTIMIZATION

[10] Optimal Control of Observable Continuous Time Markov Chains [J].

Brockett, Roger .

47TH IEEE CONFERENCE ON DECISION AND CONTROL, 2008 (CDC 2008), 2008, :4269-4274

← 1 2 3 4 →