Bayesian Learning of Noisy Markov Decision Processes

被引:4
作者
Singh, Sumeetpal S. [1 ]
Chopin, Nicolas [2 ,3 ]
Whiteley, Nick [4 ]
机构
[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England
[2] CREST ENSAE, Paris, France
[3] HEC Paris, Paris, France
[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England
来源
ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2013年 / 23卷 / 01期
关键词
Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;
D O I
10.1145/2414416.2414420
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.
引用
收藏
页数:25
相关论文
共 34 条