Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引：0

作者：

Nakata Y. ^{[1
]}

Arai S. ^{[2
]}

机构：

[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2019年 / 34卷 / 06期

关键词：

Inverse reinforcement learning; Linear programming;

D O I：

10.1527/tjsai.B-J23

中图分类号：

学科分类号：

摘要：

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.

引用

共 50 条

[31] An Inverse Reinforcement Learning Method to Infer Reward Function of Intelligent Jammer
Fan, Youlin
Jiu, Bo
Pu, Wenqiang
Li, Kang
Zhang, Yu
Liu, Hongwei
Proceedings of the IEEE Radar Conference, 2023,
[32] A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines
Zhou, Weichao
Li, Wenchao
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[33] Towards Interpretable Deep Reinforcement Learning Models via Inverse Reinforcement Learning
Xie, Yuansheng
Vosoughi, Soroush
Hassanpour, Saeed
2022 26TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2022, : 5067 - 5074
[34] On Reward-Free Reinforcement Learning with Linear Function Approximation
Wang, Ruosong
Du, Simon S.
Yang, Lin F.
Salakhutdinov, Ruslan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[35] Reward-Relevance-Filtered Linear Offline Reinforcement Learning
Zhou, Angela
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
[36] Inverse reinforcement learning methods for linear differential games
Asl, Hamed Jabbari
Uchibe, Eiji
SYSTEMS & CONTROL LETTERS, 2024, 193
[37] Inverse Reinforcement Learning Control for Linear Multiplayer Games
Lian, Bosen
Donge, Vrushabh S.
Lewis, Frank L.
Chai, Tianyou
Davoudi, Ali
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 2839 - 2844
[38] Linear inverse reinforcement learning in continuous time and space
Kamalapurkar, Rushikesh
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 1683 - 1688
[39] Learning Fairness from Demonstrations via Inverse Reinforcement Learning
Blandin, Jack
Kash, Ian
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 51 - 61
[40] Learning Tasks in Intelligent Environments via Inverse Reinforcement Learning
Shah, Syed Ihtesham Hussain
Coronato, Antonio
2021 17TH INTERNATIONAL CONFERENCE ON INTELLIGENT ENVIRONMENTS (IE), 2021,

← 1 2 3 4 5 →