Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward

被引:0
|
作者
Yusukc N. [1 ]
Sachiyo A. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Bayesian inference; Inverse reinforcement learning; Markov decision processes; Reinforcement learning;
D O I
10.1527/tjsai.G-J73
中图分类号
学科分类号
摘要
Though a reinforcement learning framework has numerous achievements, it requires a careful shaping of a re-ward function that represents the objective of a task. There is a class of task in which an expert could demonstrate the optimal way of doing, but it is difficult to design a proper reward function. For these tasks, an inverse reinforcement learning approach seems useful because it makes it possible to estimates a reward function from expert's demonstrations. Most existing inverse reinforcement learning algorithms assume that an expert gives demonstrations in a unique environment. However, an expert also could provide demonstrations of tasks within other environments of which have a specific objective function. For example, though it is hard to represent objective explicitly for a driving task, the driver could give demonstrations under multiple situations. In such cases, it is natural to utilize these demonstrations in multiple environments to estimate expert's reward functions. We formulate this problem as Bayesian Inverse Rein-forcement Learning problem and propose a Markov Chain Monte Carlo method for the problem. Experimental results show that the proposed method quantitatively overperforms existing methods. © 2020, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [1] Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning
    Nakata Y.
    Arai S.
    Transactions of the Japanese Society for Artificial Intelligence, 2019, 34 (06)
  • [2] Active Learning for Reward Estimation in Inverse Reinforcement Learning
    Lopes, Manuel
    Melo, Francisco
    Montesano, Luis
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
  • [3] Inverse Reinforcement Learning of Interaction Dynamics from Demonstrations
    Hussein, Mostafa
    Begum, Momotaz
    Petrik, Marek
    2019 INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2019, : 2267 - 2274
  • [4] Analysis of Inverse Reinforcement Learning with Perturbed Demonstrations
    Melo, Francisco S.
    Lopes, Manuel
    Ferreira, Ricardo
    ECAI 2010 - 19TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2010, 215 : 349 - 354
  • [5] Inverse Constraint Learning and Generalization by Transferable Reward Decomposition
    Jang, Jaehwi
    Song, Minjae
    Park, Daehyung
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (01) : 279 - 286
  • [6] Reward Identification in Inverse Reinforcement Learning
    Kim, Kuno
    Garg, Shivam
    Shiragur, Kirankumar
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [7] Compatible Reward Inverse Reinforcement Learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [8] Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics
    Herman, Michael
    Gindele, Tobias
    Wagner, Joerg
    Schmitt, Felix
    Burgard, Wolfram
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 51, 2016, 51 : 102 - 110
  • [9] Learning Fairness from Demonstrations via Inverse Reinforcement Learning
    Blandin, Jack
    Kash, Ian
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 51 - 61
  • [10] An Unified Approach to Inverse Reinforcement Learning by Oppositive Demonstrations
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Tseng, Yi-Chia
    PROCEEDINGS 2016 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2016, : 1664 - 1668