Bayesian Learning of Noisy Markov Decision Processes

被引：4

作者：

Singh, Sumeetpal S. ^{[1
]}

Chopin, Nicolas ^{[2
,3
]}

Whiteley, Nick ^{[4
]}

机构：

[1] Univ Cambridge, Dept Engn, Cambridge CB2 1PZ, England

[2] CREST ENSAE, Paris, France

[3] HEC Paris, Paris, France

[4] Univ Bristol, Sch Math, Bristol BS8 1TW, Avon, England

来源：

ACM TRANSACTIONS ON MODELING AND COMPUTER SIMULATION | 2013年 / 23卷 / 01期

关键词：

Data augmentation; parameter expansion; Markov Chain Monte Carlo; Markov decision process; Bayesian inference; DATA AUGMENTATION; MODEL; ALGORITHM;

D O I：

10.1145/2414416.2414420

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

We consider the inverse reinforcement learning problem, that is, the problem of learning from, and then predicting or mimicking a controller based on state/action data. We propose a statistical model for such data, derived from the structure of a Markov decision process. Adopting a Bayesian approach to inference, we show how latent variables of the model can be estimated, and how predictions about actions can be made, in a unified framework. A new Markov chain Monte Carlo (MCMC) sampler is devised for simulation from the posterior distribution. This step includes a parameter expansion step, which is shown to be essential for good convergence properties of the MCMC sampler. As an illustration, the method is applied to learning a human controller.

引用

页数：25

共 34 条

[1] Swapping the nested fixed point algorithm: A class of estimators for discrete Markov decision models
Aguirregabiria, V
Mira, P
[J]. ECONOMETRICA, 2002, 70 (04) : 1519 - 1543
[2] BAYESIAN-ANALYSIS OF BINARY AND POLYCHOTOMOUS RESPONSE DATA
ALBERT, JH
CHIB, S
[J]. JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (422) : 669 - 679
[3] [Anonymous], 2007, DYNAMIC PROGRAMMING
[4] [Anonymous], 2004, 21 INT C MACHINE LEA
[5] [Anonymous], 1996, Neuro-dynamic programming
[6] Bertsekas D. P., 2005, DYNAMIC PROGRAMMING, V1
[7] A sequential particle filter method for static models
Chopin, N
[J]. BIOMETRIKA, 2002, 89 (03) : 539 - 551
[8] Fast simulation of truncated Gaussian distributions
Chopin, Nicolas
[J]. STATISTICS AND COMPUTING, 2011, 21 (02) : 275 - 288
[9] Apprenticeship Learning for Helicopter Control
Coates, Adam
Abbeel, Pieter
Ng, Andrew Y.
[J]. COMMUNICATIONS OF THE ACM, 2009, 52 (07) : 97 - 105
[10] Sequential Monte Carlo samplers
Del Moral, Pierre
Doucet, Arnaud
Jasra, Ajay
[J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2006, 68 : 411 - 436

← 1 2 3 4 →