Convergence analysis of an incremental approach to online inverse reinforcement learning

被引：5

作者：

Jin, Zhuo-jun ^{[1
]}

Qian, Hui ^{[1
]}

Chen, Shen-yi ^{[1
]}

Zhu, Miao-liang ^{[1
]}

机构：

[1] Zhejiang Univ, Sch Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China

来源：

JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS | 2011年 / 12卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Incremental approach; Reward recovering; Online learning; Inverse reinforcement learning; Markov decision process;

D O I：

10.1631/jzus.C1010010

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the I R L problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.

引用

页码：17 / 24

页数：8

共 17 条

[1] Abbeel P., 2004, P 21 INT C MACH LEAR
[2] Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation
Abbeel, Pieter
Dolgov, Drnitri
Ng, Andrew Y.
Thrun, Sebastian
[J]. 2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 1083 - 1090
[3] ABBEEL PY, 2007, ADV NEURAL INFORM PR, P76
[4] Modified reward function on abstract features in inverse reinforcement learning
Chen, Shen-yi
Qian, Hui
Fan, Jia
Jin, Zhuo-jun
Zhu, Miao-liang
[J]. JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2010, 11 (09): : 718 - 723
[5] Kivinen J, 2002, LECT NOTES ARTIF INT, V2600, P235
[6] Lopes M, 2009, LECT NOTES ARTIF INT, V5782, P31, DOI 10.1007/978-3-642-04174-7_3
[7] Neu G., 2007, P 23 C UNC ART INT, P295
[8] Ng A. Y., 2000, P INT C MACH LEARN I, P663
[9] Ramachandran D, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2586
[10] RATLIFF DN, 2006, 23 INT C MACH LEARN, P729, DOI DOI 10.1145/1143844.1143936

← 1 2 →