Estimating consistent reward of expert in multiple dynamics via linear programming inverse reinforcement learning

被引：0

作者：

Nakata Y. ^{[1
]}

Arai S. ^{[2
]}

机构：

[1] Department of Urban Environment Systems, Graduate School of Science and Engineering, Chiba University

[2] Department of Urban Environment Systems, Graduate School of Engineering, Chiba University

来源：

Transactions of the Japanese Society for Artificial Intelligence | 2019年 / 34卷 / 06期

关键词：

Inverse reinforcement learning; Linear programming;

D O I：

10.1527/tjsai.B-J23

中图分类号：

学科分类号：

摘要：

Reinforcement learning is a powerful framework for decision making and control, but it requires a manually specified reward function. Inverse reinforcement learning (IRL) automatically recovers reward function from policy or demonstrations of experts. Most of the existing IRL algorithms assume that the expert policy or demonstrations in a fixed environment is given, but there are cases these are collected in multiple environments. In this work, we propose an IRL algorithm that is guaranteed to recover reward functions from models of multiple environments and expert policy for each environment. We assume that the expert in multiple environments shares a reward function, and estimate reward functions for which each expert policy is optimal in the corresponding environment. To handle policies in multiple environments, we extend linear programming IRL. Our method solves the linear programming problem of maximizing the sum of the original objective functions of each environment while satisfying conditions of all the given environments. Satisfying conditions of all the given environments is a necessary condition to match with expert reward, and estimated reward by proposed method satisfies this necessary condition. In the experiment, using Windy grid world environments, we demonstrate that our algorithm is able to recover reward functions for which expert policies are optimal for corresponding environments. © 2019, Japanese Society for Artificial Intelligence. All rights reserved.

引用

共 50 条

[1] Bavesian inverse reinforcement learning for demonstrations of an expert in multiple dynamics: Toward estimation of transferable reward
Yusukc N.
Sachiyo A.
Transactions of the Japanese Society for Artificial Intelligence, 2020, 35 (01)
[2] Inverse Reinforcement Learning with Locally Consistent Reward Functions
Quoc Phong Nguyen
Low, Kian Hsiang
Jaillet, Patrick
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
[3] Reward Identification in Inverse Reinforcement Learning
Kim, Kuno
Garg, Shivam
Shiragur, Kirankumar
Ermon, Stefano
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[4] Compatible Reward Inverse Reinforcement Learning
Metelli, Alberto Maria
Pirotta, Matteo
Restelli, Marcello
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[5] Convergence Rates of Average-Reward Multi-agent Reinforcement Learning via Randomized Linear Programming
Koppel, Alec
Bedi, Amrit Singh
Ganguly, Bhargav
Aggarwal, Vaneet
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 4545 - 4552
[6] Active Learning for Reward Estimation in Inverse Reinforcement Learning
Lopes, Manuel
Melo, Francisco
Montesano, Luis
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
[7] Mapping Language to Programs using Multiple Reward Components with Inverse Reinforcement Learning
Ghosh, Sayan
Srivastava, Shashank
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 1449 - 1462
[8] Option compatible reward inverse reinforcement learning
Hwang, Rakhoon
Lee, Hanjin
Hwang, Hyung Ju
PATTERN RECOGNITION LETTERS, 2022, 154 : 83 - 89
[9] Inverse Reinforcement Learning with the Average Reward Criterion
Wu, Feiyang
Ke, Jingyang
Wu, Anqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Constrained Bayesian Reinforcement Learning via Approximate Linear Programming
Lee, Jongmin
Jang, Youngsoo
Poupart, Pascal
Kim, Kee-Eung
PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2088 - 2095

← 1 2 3 4 5 →