Optimal control as a graphical model inference problem

被引：185

作者：

Kappen, Hilbert J. ^{[1
]}

Gomez, Vicenc ^{[1
]}

Opper, Manfred ^{[2
]}

机构：

[1] Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, NL-6525 EZ Nijmegen, Netherlands

[2] TU Berlin, Dept Comp Sci, D-10587 Berlin, Germany

来源：

MACHINE LEARNING | 2012年 / 87卷 / 02期

关键词：

Optimal control; Uncontrolled dynamics; Kullback-Leibler divergence; Graphical model; Approximate inference; Cluster variation method; Belief propagation;

D O I：

10.1007/s10994-012-5278-7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We reformulate a class of non-linear stochastic optimal control problems introduced by Todorov (in Advances in Neural Information Processing Systems, vol. 19, pp. 1369-1376, 2007) as a Kullback-Leibler (KL) minimization problem. As a result, the optimal control computation reduces to an inference computation and approximate inference methods can be applied to efficiently compute approximate optimal controls. We show how this KL control theory contains the path integral control method as a special case. We provide an example of a block stacking task and a multi-agent cooperative game where we demonstrate how approximate inference can be successfully applied to instances that are too complex for exact computation. We discuss the relation of the KL control approach to other inference approaches to control.

引用

页码：159 / 182

页数：24

共 40 条

[1] The cluster variation method for efficient linkage analysis on extended pedigrees [J].

Albers, CA ;

Leisink, MAR ;

Kappen, HJ .

BMC BIOINFORMATICS, 2006, 7 (Suppl 1)

[2] Haplotype inference in general pedigrees using the cluster variation method [J].

Albers, Cornelis A. ;

Heskes, Tom ;

Kappen, Hilbert J. .

GENETICS, 2007, 177 (02) :1101-1116

[3]

[Anonymous], 1996, Neuro-dynamic programming

[4]

[Anonymous], 1999, Learning in Graphical Models

[5]

[Anonymous], 2004, STAG HUNT EVOLUTION, DOI DOI 10.1017/CBO9781139165228

[6]

[Anonymous], 1996, ARTIFICIAL INTELLIGE

[7]

Bagnell J. A., 2003, INT JOINT C ART INT

[8]

Bierkens J., 2012, KL LEARNING ONLINE S

[9]

Boutilier C., 1995, IJCAI-95. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence, P1104

[10]

Cooper G.F., 1988, Proceedings of the fourth conference on uncertainty in artificial intelligence, P55

← 1 2 3 4 →