Lagrange Dual Decomposition for Finite Horizon Markov Decision Processes

被引:0
作者
Furmston, Thomas [1 ]
Barber, David [1 ]
机构
[1] UCL, Dept Comp Sci, London WC1E 6BT, England
来源
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I | 2011年 / 6911卷
关键词
Markov Decision Processes; Planning; Lagrange Duality;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Solving finite-horizon Markov Decision Processes with stationary policies is a computationally difficult problem. Our dynamic dual decomposition approach uses Lagrange duality to decouple this hard problem into a sequence of tractable sub-problems. The resulting procedure is a straightforward modification of standard non-stationary Markov Decision Process solvers and gives an upper-bound on the total expected reward. The empirical performance of the method suggests that not only is it a rapidly convergent algorithm, but that it also performs favourably compared to standard planning algorithms such as policy gradients and lower-bound procedures such as Expectation Maximisation.
引用
收藏
页码:487 / 502
页数:16
相关论文
共 20 条
[1]  
[Anonymous], 1999, Athena scientific Belmont
[2]  
[Anonymous], 2010, ARTIFICIAL INTELLIGE
[3]  
[Anonymous], 2000, Dynamic programming and optimal control
[4]  
Barber D., 2011, Bayesian Reasoning and Machine Learning
[5]  
Boyd S., 2004, CONVEX OPTIMIZATION, VFirst, DOI DOI 10.1017/CBO9780511804441
[6]  
Dearden R, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P761
[7]  
Fraley C., 1999, EDIINFRR0934 U WASH
[8]  
Furmston T., 2011, UNCERTAINTY ARTIFICI
[9]  
Furmston T., 2011, RN1113 U COLL LOND C
[10]  
Hoffman M., 2008, ADV NEURAL INFORM PR, P665