Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations

被引：5

作者：

Melo, Francisco S. ^{[1
]}

Lopes, Manuel ^{[2
]}

Ferreira, Ricardo ^{[3
]}

机构：

[1] INESC ID portugal, IST, Lisbon, Portugal

[2] Univ Plymouth, Plymouth, Devon, England

[3] Inst Syst & Robot, Lisbon, Portugal

来源：

ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2010年 / 215卷

基金：

欧盟第七框架计划;

关键词：

D O I：

10.3233/978-1-60750-606-5-349

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inverse reinforcement learning (IRL) addresses the problem of recovering the unknown reward function for a given Markov decision problem (MDP) given the corresponding optimal policy or a perturbed version thereof. This paper studies the space of possible solutions to the general IRL problem, when the agent is provided with incomplete/imperfect information regarding the optimal policy for the MDP whose reward must be estimated. We focus on scenarios with finite state-action spaces and discuss the constraints imposed on the set of possible solutions when the agent is provided with (i) perturbed policies; (ii) optimal policies; and (iii) incomplete policies. We discuss previous works on IRL in light of our analysis and show that, with our characterization of the solution space, it is possible to determine non-trivial closed-form solutions for the IRL problem. We also discuss several other interesting aspects of the IRL problem that stem from our analysis.

引用

页码：349 / 354

页数：6

共 11 条

[1] An introduction to MCMC for machine learning [J].

Andrieu, C ;

de Freitas, N ;

Doucet, A ;

Jordan, MI .

MACHINE LEARNING, 2003, 50 (1-2) :5-43

[2]

[Anonymous], 2004, ICML

[3]

[Anonymous], THESIS

[4]

da Silva VF, 2006, IEEE INT CONF ROBOT, P4246

[5]

Neu G., 2007, C UNC ART INT UAI, P295

[6]

Ng A. Y., 2000, P INT C MACH LEARN I, P663

[7]

Ng AY, 1999, MACHINE LEARNING, PROCEEDINGS, P278

[8]

Ramachandran D., 2007, IJCAI, P2586

[9]

Russell S., 1998, ADV NEURAL INFORM PR, V10

[10]

Syed U., 2008, P INT C MACHINE LEAR, P1032, DOI DOI 10.1145/1390156.1390286

← 1 2 →