Compatible Reward Inverse Reinforcement Learning

被引：0

作者：

Metelli, Alberto Maria ^{[1
]}

Pirotta, Matteo ^{[2
]}

Restelli, Marcello ^{[1
]}

机构：

[1] Politecn Milan, DEIB, Milan, Italy

[2] Inria Lille, SequeL Team, Lille, France

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017) | 2017年 / 30卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Inverse Reinforcement Learning (IRL) is an effective approach to recover a reward function that explains the behavior of an expert by observing a set of demonstrations. This paper is about a novel model-free IRL approach that, differently from most of the existing IRL algorithms, does not require to specify a function space where to search for the expert's reward function. Leveraging on the fact that the policy gradient needs to be zero for any optimal policy, the algorithm generates a set of basis functions that span the subspace of reward functions that make the policy gradient vanish. Within this subspace, using a second-order criterion, we search for the reward function that penalizes the most a deviation from the expert's policy. After introducing our approach for finite domains, we extend it to continuous ones. The proposed approach is empirically compared to other IRL methods both in the (finite) Taxi domain and in the (continuous) Linear Quadratic Gaussian (LQG) and Car on the Hill environments.

引用

页数：10

共 41 条

[1] Abbeel, 2004, ICML 2004
[2] [Anonymous], 2008, P INT C MACHINE LEAR, DOI DOI 10.1145/1390156.1390286
[3] [Anonymous], 2012, MULTIPLE OBJECTIVE D
[4] [Anonymous], ADV NEURAL INFORM PR
[5] [Anonymous], 2008, AAAI
[6] [Anonymous], 2012, Adv. Neural Inform. Processing Systems
[7] [Anonymous], 2005, P 22 INT C MACH LEAR, DOI DOI 10.1145/1102351.1102421
[8] [Anonymous], 2016, International Conference on Machine Learning
[9] A survey of robot learning from demonstration
Argall, Brenna D.
Chernova, Sonia
Veloso, Manuela
Browning, Brett
[J]. ROBOTICS AND AUTONOMOUS SYSTEMS, 2009, 57 (05) : 469 - 483
[10] Audiffren J, 2015, PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), P3315

← 1 2 3 4 5 →