Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引：0

作者：

Nakata Y. ^{[1
]}

Arai S. ^{[2
]}

机构：

[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2019年 / 34卷 / 06期

关键词：

Inverse reinforcement learning; Linear programming;

D O I：

10.1527/tjsai.B-J23

中图分类号：

学科分类号：

摘要：

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.

引用

共 50 条

[41] Optimal maintenance scheduling under uncertainties using Linear Programming-enhanced Reinforcement Learning
Hu, Jueming
Wang, Yuhao
Pang, Yutian
Liu, Yongming
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 109
[42] A Study of Linear Programming and Reinforcement Learning for One-Shot Game in Smart Grid Security
Paul, Shuva
Ni, Zhen
2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2018,
[43] A Simple Sensitivity Analysis Method for Unmeasured Confounders via Linear Programming With Estimating Equation Constraints
Tang, Chengyao
Zhou, Yi
Huang, Ao
Hattori, Satoshi
STATISTICS IN MEDICINE, 2025, 44 (3-4)
[44] Analyzing Sensor-Based Individual and Population Behavior Patterns via Inverse Reinforcement Learning
Lin, Beiyu
Cook, Diane J.
SENSORS, 2020, 20 (18) : 1 - 21
[45] OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching
Hoshino, Hana
Ota, Kei
Kanezaki, Asako
Yokota, Rio
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022,
[46] Human motion analysis in medical robotics via high-dimensional inverse reinforcement learning
Li, Kun
Burdick, Joel W.
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (05): : 568 - 585
[47] Sparse online maximum entropy inverse reinforcement learning via proximal optimization and truncated gradient
Song L.
Li D.
Xu X.
Knowledge-Based Systems, 2022, 252
[48] A state-based inverse reinforcement learning approach to model activity-travel choices behavior with reward function recovery
Song, Yuchen
Li, Dawei
Ma, Zhenliang
Liu, Dongjie
Zhang, Tong
TRANSPORTATION RESEARCH PART C-EMERGING TECHNOLOGIES, 2024, 158
[49] Unraveling human social behavior motivations via inverse reinforcement learning-based link prediction
Jiang, Xin
Liu, Hongbo
Yang, Liping
Zhang, Bo
Ward, Tomas E.
Snasel, Vaclav
COMPUTING, 2024, 106 (06) : 1963 - 1986
[50] A Bayesian Approach forQuantifying Data Scarcity when Modeling Human Behavior via Inverse Reinforcement Learning
Hossain, Tahera
Shen, Wanggang
Antar, Anindya
Prabhudesai, Snehal
Inoue, Sozo
Huan, Xun
Banovic, Nikola
ACM TRANSACTIONS ON COMPUTER-HUMAN INTERACTION, 2023, 30 (01)

← 1 2 3 4 5 →