Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引:0
|
作者
Nakata Y. [1 ]
Arai S. [2 ]
机构
[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University
[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University
关键词
Inverse reinforcement learning; Linear programming;
D O I
10.1527/tjsai.B-J23
中图分类号
学科分类号
摘要
Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.
引用
收藏
相关论文
共 50 条
  • [41] Optimal maintenance scheduling under uncertainties using Linear Programming-enhanced Reinforcement Learning
    Hu, Jueming
    Wang, Yuhao
    Pang, Yutian
    Liu, Yongming
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 109
  • [42] A Study of Linear Programming and Reinforcement Learning for One-Shot Game in Smart Grid Security
    Paul, Shuva
    Ni, Zhen
    2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
  • [43] A Simple Sensitivity Analysis Method for Unmeasured Confounders via Linear Programming With Estimating Equation Constraints
    Tang, Chengyao
    Zhou, Yi
    Huang, Ao
    Hattori, Satoshi
    STATISTICS IN MEDICINE, 2025, 44 (3-4)
  • [44] Analyzing Sensor-Based Individual and Population Behavior Patterns via Inverse Reinforcement Learning
    Lin, Beiyu
    Cook, Diane J.
    SENSORS, 2020, 20 (18) : 1 - 21
  • [45] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
    Hoshino, Hana
    Ota, Kei
    Kanezaki, Asako
    Yokota, Rio
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
  • [46] Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning
    Li, Kun
    Burdick, Joel W.
    INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (05): : 568 - 585
  • [47] Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient
    Song L.
    Li D.
    Xu X.
    Knowledge-Based Systems, 2022, 252
  • [48] A state-based inverse reinforcement learning approach to model activity-travel choices behavior with reward function recovery
    Song, Yuchen
    Li, Dawei
    Ma, Zhenliang
    Liu, Dongjie
    Zhang, Tong
    TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 158
  • [49] Unraveling human social behavior motivations via inverse reinforcement learning-based link prediction
    Jiang, Xin
    Liu, Hongbo
    Yang, Liping
    Zhang, Bo
    Ward, Tomas E.
    Snasel, Vaclav
    COMPUTING, 2024, 106 (06) : 1963 - 1986
  • [50] A Bayesian Approach forQuantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning
    Hossain, Tahera
    Shen, Wanggang
    Antar, Anindya
    Prabhudesai, Snehal
    Inoue, Sozo
    Huan, Xun
    Banovic, Nikola
    ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2023, 30 (01)