Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward

被引:0
作者
Yusukc N. [1 ]
Sachiyo A. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Bayesian inference; Inverse reinforcement learning; Markov decision processes; Reinforcement learning;
D O I
10.1527/tjsai.G-J73
中图分类号
学科分类号
摘要
Though a reinforcement learning framework has numerous achievements, it requires a careful shaping of a re-ward function that represents the objective of a task. There is a class of task in which an expert could demonstrate the optimal way of doing, but it is difficult to design a proper reward function. For these tasks, an inverse reinforcement learning approach seems useful because it makes it possible to estimates a reward function from expert's demonstrations. Most existing inverse reinforcement learning algorithms assume that an expert gives demonstrations in a unique environment. However, an expert also could provide demonstrations of tasks within other environments of which have a specific objective function. For example, though it is hard to represent objective explicitly for a driving task, the driver could give demonstrations under multiple situations. In such cases, it is natural to utilize these demonstrations in multiple environments to estimate expert's reward functions. We formulate this problem as Bayesian Inverse Rein-forcement Learning problem and propose a Markov Chain Monte Carlo method for the problem. Experimental results show that the proposed method quantitatively overperforms existing methods. © 2020, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [31] Contextual Action with Multiple Policies Inverse Reinforcement Learning for Behavior Simulation
    Alvarez, Nahum
    Noda, Itsuki
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE (ICAART), VOL 2, 2019, : 887 - 894
  • [32] Adversarial Inverse Reinforcement Learning to Estimate Policies from Multiple Experts
    Yamashita K.
    Hamagami T.
    Yamashita, Kodai, 2021, Institute of Electrical Engineers of Japan (141) : 1405 - 1410
  • [33] Multi-objective deep inverse reinforcement learning for weight estimation of objectives
    Takayama, Naoya
    Arai, Sachiyo
    ARTIFICIAL LIFE AND ROBOTICS, 2022, 27 (03) : 594 - 602
  • [34] Multi-objective deep inverse reinforcement learning for weight estimation of objectives
    Naoya Takayama
    Sachiyo Arai
    Artificial Life and Robotics, 2022, 27 : 594 - 602
  • [35] An investor sentiment reward-based trading system using Gaussian inverse reinforcement learning algorithm
    Yang, Steve Y.
    Yu, Yangyang
    Almandi, Saud
    EXPERT SYSTEMS WITH APPLICATIONS, 2018, 114 : 388 - 401
  • [36] Accelerated Inverse Reinforcement Learning with Randomly Pre-sampled Policies for Autonomous Driving Reward Design
    Xin, Long
    Li, Shengbo Eben
    Wang, Pin
    Cao, Wenhan
    Nie, Bingbing
    Chan, Ching-Yao
    Cheng, Bo
    2019 IEEE INTELLIGENT TRANSPORTATION SYSTEMS CONFERENCE (ITSC), 2019, : 2757 - 2764
  • [37] Cooperative Multi-agent Inverse Reinforcement Learning Based on Selfish Expert and its Behavior Archives
    Fukumoto, Yukiko
    Tadokoro, Masakazu
    Takadama, Keiki
    2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, : 2202 - 2209
  • [38] Optimal drug-dosing of cancer dynamics with fuzzy reinforcement learning and discontinuous reward function
    Treesatayapun, Chidentree
    Munoz-Vazquez, Aldo Jonathan
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 120
  • [39] Multi-robot inverse reinforcement learning under occlusion with estimation of state transitions
    Bogert, Kenneth
    Doshi, Prashant
    ARTIFICIAL INTELLIGENCE, 2018, 263 : 46 - 73
  • [40] A review on modeling tumor dynamics and agent reward functions in reinforcement learning based therapy optimization
    Almasy, Marton Gyorgy
    Horompo, Andras
    Kiss, Daniel
    Kertesz, Gabor
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2022, 43 (06) : 6939 - 6946