Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引：0

作者：

Nakata Y. ^{[1
]}

Arai S. ^{[2
]}

机构：

[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2019年 / 34卷 / 06期

关键词：

Inverse reinforcement learning; Linear programming;

D O I：

10.1527/tjsai.B-J23

中图分类号：

学科分类号：

摘要：

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.

引用

共 50 条

[21] Towards Accurate And Robust Dynamics and Reward Modeling for Model-Based Offline Inverse Reinforcement Learning
Zhang, Gengyu
Yan, Yan
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024, 2024, : 611 - 618
[22] Regularized Multiple Criteria Linear Programming via Linear Programming
Qi, Zhiquan
Tian, Yingjie
Shi, Yong
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE, ICCS 2012, 2012, 9 : 1234 - 1239
[23] Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity
Shimosaka, Masamichi
Nishi, Kentaro
Sato, Junichi
Kataoka, Hirokatsu
2015 IEEE INTELLIGENT VEHICLES SYMPOSIUM (IV), 2015, : 567 - 572
[24] Bayesian Inverse Reinforcement Learning-based Reward Learning for Automated Driving
Zeng, Di
Zheng, Ling
Li, Yinong
Yang, Xiantong
Jixie Gongcheng Xuebao/Journal of Mechanical Engineering, 2024, 60 (10): : 245 - 260
[25] Machining sequence learning via inverse reinforcement learning
Sugisawa, Yasutomo
Takasugi, Keigo
Asakawa, Naoki
PRECISION ENGINEERING-JOURNAL OF THE INTERNATIONAL SOCIETIES FOR PRECISION ENGINEERING AND NANOTECHNOLOGY, 2022, 73 : 477 - 487
[26] Parallel reinforcement learning using multiple reward signals
Raicevic, Peter
NEUROCOMPUTING, 2006, 69 (16-18) : 2171 - 2179
[27] Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation
Kim, Woo Kyung
Yoo, Minjong
Woo, Honguk
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4300 - 4307
[28] Off-Dynamics Inverse Reinforcement Learning
Kang, Yachen
Liu, Jinxin
Wang, Donglin
IEEE ACCESS, 2024, 12 : 65117 - 65127
[29] Transfer in Inverse Reinforcement Learning for Multiple Strategies
Tanwani, Ajay Kumar
Billard, Aude
2013 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2013, : 3244 - 3250
[30] Haptic Assistance via Inverse Reinforcement Learning
Scobee, Dexter R. R.
Royo, Vicenc Rubies
Tomlin, Claire J.
Sastry, S. Shankar
2018 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2018, : 1510 - 1517

← 1 2 3 4 5 →