Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引:0
|
作者
Nakata Y. [1 ]
Arai S. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Inverse reinforcement learning; Linear programming;
D O I
10.1527/tjsai.B-J23
中图分类号
学科分类号
摘要
Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [21] Towards Accurate And Robust Dynamics and Reward Modeling for Model-Based Offline Inverse Reinforcement Learning
    Zhang, Gengyu
    Yan, Yan
    2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024, 2024, : 611 - 618
  • [22] Regularized Multiple Criteria Linear Programming via Linear Programming
    Qi, Zhiquan
    Tian, Yingjie
    Shi, Yong
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 1234 - 1239
  • [23] Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity
    Shimosaka, Masamichi
    Nishi, Kentaro
    Sato, Junichi
    Kataoka, Hirokatsu
    2015 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2015, : 567 - 572
  • [24] Bayesian Inverse Reinforcement Learning-based Reward Learning for Automated Driving
    Zeng, Di
    Zheng, Ling
    Li, Yinong
    Yang, Xiantong
    Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (10): : 245 - 260
  • [25] Machining sequence learning via inverse reinforcement learning
    Sugisawa, Yasutomo
    Takasugi, Keigo
    Asakawa, Naoki
    PRECISION ENGINEERING-JOURNAL OF THE INTERNATIONAL SOCIETIES FOR PRECISION ENGINEERING AND NANOTECHNOLOGY, 2022, 73 : 477 - 487
  • [26] Parallel reinforcement learning using multiple reward signals
    Raicevic, Peter
    NEUROCOMPUTING, 2006, 69 (16-18) : 2171 - 2179
  • [27] Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
    Kim, Woo Kyung
    Yoo, Minjong
    Woo, Honguk
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4300 - 4307
  • [28] Off-Dynamics Inverse Reinforcement Learning
    Kang, Yachen
    Liu, Jinxin
    Wang, Donglin
    IEEE ACCESS, 2024, 12 : 65117 - 65127
  • [29] Transfer in Inverse Reinforcement Learning for Multiple Strategies
    Tanwani, Ajay Kumar
    Billard, Aude
    2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 3244 - 3250
  • [30] Haptic Assistance via Inverse Reinforcement Learning
    Scobee, Dexter R. R.
    Royo, Vicenc Rubies
    Tomlin, Claire J.
    Sastry, S. Shankar
    2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1510 - 1517