Convergence analysis of an incremental approach to online inverse reinforcement learning

被引:5
作者
Jin, Zhuo-jun [1 ]
Qian, Hui [1 ]
Chen, Shen-yi [1 ]
Zhu, Miao-liang [1 ]
机构
[1] Zhejiang Univ, Sch Comp Sci & Technol, Hangzhou 310027, Zhejiang, Peoples R China
来源
JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS | 2011年 / 12卷 / 01期
基金
中国国家自然科学基金;
关键词
Incremental approach; Reward recovering; Online learning; Inverse reinforcement learning; Markov decision process;
D O I
10.1631/jzus.C1010010
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Interest in inverse reinforcement learning (IRL) has recently increased, that is, interest in the problem of recovering the reward function underlying a Markov decision process (MDP) given the dynamics of the system and the behavior of an expert. This paper deals with an incremental approach to online IRL. First, the convergence property of the incremental method for the IRL problem was investigated, and the bounds of both the mistake number during the learning process and regret were provided by using a detailed proof. Then an online algorithm based on incremental error correcting was derived to deal with the I R L problem. The key idea is to add an increment to the current reward estimate each time an action mismatch occurs. This leads to an estimate that approaches a target optimal value. The proposed method was tested in a driving simulation experiment and found to be able to efficiently recover an adequate reward function.
引用
收藏
页码:17 / 24
页数:8
相关论文
共 17 条
  • [1] Abbeel P., 2004, P 21 INT C MACH LEAR
  • [2] Apprenticeship Learning for Motion Planning with Application to Parking Lot Navigation
    Abbeel, Pieter
    Dolgov, Drnitri
    Ng, Andrew Y.
    Thrun, Sebastian
    [J]. 2008 IEEE/RSJ INTERNATIONAL CONFERENCE ON ROBOTS AND INTELLIGENT SYSTEMS, VOLS 1-3, CONFERENCE PROCEEDINGS, 2008, : 1083 - 1090
  • [3] ABBEEL PY, 2007, ADV NEURAL INFORM PR, P76
  • [4] Modified reward function on abstract features in inverse reinforcement learning
    Chen, Shen-yi
    Qian, Hui
    Fan, Jia
    Jin, Zhuo-jun
    Zhu, Miao-liang
    [J]. JOURNAL OF ZHEJIANG UNIVERSITY-SCIENCE C-COMPUTERS & ELECTRONICS, 2010, 11 (09): : 718 - 723
  • [5] Kivinen J, 2002, LECT NOTES ARTIF INT, V2600, P235
  • [6] Lopes M, 2009, LECT NOTES ARTIF INT, V5782, P31, DOI 10.1007/978-3-642-04174-7_3
  • [7] Neu G., 2007, P 23 C UNC ART INT, P295
  • [8] Ng A. Y., 2000, P INT C MACH LEARN I, P663
  • [9] Ramachandran D, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2586
  • [10] RATLIFF DN, 2006, 23 INT C MACH LEARN, P729, DOI DOI 10.1145/1143844.1143936