Optimal control as a graphical model inference problem

被引:185
作者
Kappen, Hilbert J. [1 ]
Gomez, Vicenc [1 ]
Opper, Manfred [2 ]
机构
[1] Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, NL-6525 EZ Nijmegen, Netherlands
[2] TU Berlin, Dept Comp Sci, D-10587 Berlin, Germany
关键词
Optimal control; Uncontrolled dynamics; Kullback-Leibler divergence; Graphical model; Approximate inference; Cluster variation method; Belief propagation;
D O I
10.1007/s10994-012-5278-7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (in Advances in Neural Information Processing Systems, vol. 19, pp. 1369-1376, 2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.
引用
收藏
页码:159 / 182
页数:24
相关论文
共 40 条
[31]   General duality between optimal control and estimation [J].
Todorov, Emanuel .
47TH IEEE CONFERENCE ON DECISION AND CONTROL, 2008 (CDC 2008), 2008, :4286-4292
[32]   Efficient computation of optimal actions [J].
Todorov, Emanuel .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2009, 106 (28) :11478-11483
[33]  
Toussaint M., 2006, P 23 INT C MACH LEAR, P945, DOI DOI 10.1145/1143844.1143963
[34]   Graphical model inference in optimal control of stochastic multi-agent systems [J].
van den Broek, Bart ;
Wiegerinck, Wim ;
Kappen, Bert .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2008, 32 :95-122
[35]  
van den Broek B, 2008, LECT NOTES ARTIF INT, V4865, P15, DOI 10.1007/978-3-540-77949-0_2
[36]  
Wiegerinck W., 2006, P 22 C UNC ART INT U, P528
[37]  
Wiegerinck W., 2007, P 6 INT JOINT C AUT, P749
[38]   Constructing free-energy approximations and generalized belief propagation algorithms [J].
Yedidia, JS ;
Freeman, WT ;
Weiss, Y .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2005, 51 (07) :2282-2312
[39]  
Yedidia JS, 2001, ADV NEUR IN, V13, P689
[40]   Game Theory of Mind [J].
Yoshida, Wako ;
Dolan, Ray J. ;
Friston, Karl J. .
PLOS COMPUTATIONAL BIOLOGY, 2008, 4 (12)