Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引:0
|
作者
Nakata Y. [1 ]
Arai S. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Inverse reinforcement learning; Linear programming;
D O I
10.1527/tjsai.B-J23
中图分类号
学科分类号
摘要
Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [1] Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward
    Yusukc N.
    Sachiyo A.
    Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (01)
  • [2] Inverse Reinforcement Learning with Locally Consistent Reward Functions
    Quoc Phong Nguyen
    Low, Kian Hsiang
    Jaillet, Patrick
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [3] Reward Identification in Inverse Reinforcement Learning
    Kim, Kuno
    Garg, Shivam
    Shiragur, Kirankumar
    Ermon, Stefano
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Compatible Reward Inverse Reinforcement Learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [5] Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming
    Koppel, Alec
    Bedi, Amrit Singh
    Ganguly, Bhargav
    Aggarwal, Vaneet
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4545 - 4552
  • [6] Active Learning for Reward Estimation in Inverse Reinforcement Learning
    Lopes, Manuel
    Melo, Francisco
    Montesano, Luis
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
  • [7] Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning
    Ghosh, Sayan
    Srivastava, Shashank
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1449 - 1462
  • [8] Option compatible reward inverse reinforcement learning
    Hwang, Rakhoon
    Lee, Hanjin
    Hwang, Hyung Ju
    PATTERN RECOGNITION LETTERS, 2022, 154 : 83 - 89
  • [9] Inverse Reinforcement Learning with the Average Reward Criterion
    Wu, Feiyang
    Ke, Jingyang
    Wu, Anqi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [10] Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
    Lee, Jongmin
    Jang, Youngsoo
    Poupart, Pascal
    Kim, Kee-Eung
    PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2088 - 2095