Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

被引：0

作者：

Furmston, Thomas ^{[1
]}

Barber, David ^{[1
]}

机构：

[1] UCL, Dept Comp Sci, London WC1E 6BT, England

来源：

MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I | 2011年 / 6911卷

关键词：

Markov Decision Processes; Planning; Lagrange Duality;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.

引用

页码：487 / 502

页数：16

共 20 条

[1]

[Anonymous], 1999, Athena scientific Belmont

[2]

[Anonymous], 2010, ARTIFICIAL INTELLIGE

[3]

[Anonymous], 2000, Dynamic programming and optimal control

[4]

Barber D., 2011, Bayesian Reasoning and Machine Learning

[5]

Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441

[6]

Dearden R, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P761

[7]

Fraley C., 1999, EDIINFRR0934 U WASH

[8]

Furmston T., 2011, UNCERTAINTY ARTIFICI

[9]

Furmston T., 2011, RN1113 U COLL LOND C

[10]

Hoffman M., 2008, ADV NEURAL INFORM PR, P665

← 1 2 →