OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

被引：0

作者：

Hoshino, Hana ^{[1
]}

Ota, Kei ^{[1
,2
]}

Kanezaki, Asako ^{[1
]}

Yokota, Rio ^{[3
]}

机构：

[1] Tokyo Inst Technol, Dept Comp Sci, Sch Comp, Tokyo, Japan

[2] Mitsubishi Electr Corp, Informat Technol R&D Ctr, Tokyo, Japan

[3] Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022) | 2022年

关键词：

Imitation Learning; Transfer Learning; Learning from Demonstration; Inverse Reinforcement Learning;

D O I：

10.1109/ICRA.46639.2022.9811660

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

引用

页数：7

共 49 条

[1] Off-Dynamics Inverse Reinforcement Learning
Kang, Yachen
Liu, Jinxin
Wang, Donglin
IEEE ACCESS, 2024, 12 : 65117 - 65127
[2] Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review
Zhang K.
Yu Y.
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 254 - 261
[3] Machining sequence learning via inverse reinforcement learning
Sugisawa, Yasutomo
Takasugi, Keigo
Asakawa, Naoki
PRECISION ENGINEERING-JOURNAL OF THE INTERNATIONAL SOCIETIES FOR PRECISION ENGINEERING AND NANOTECHNOLOGY, 2022, 73 : 477 - 487
[4] On the use of the policy gradient and Hessian in inverse reinforcement learning
Metelli, Alberto Maria
Pirotta, Matteo
Restelli, Marcello
INTELLIGENZA ARTIFICIALE, 2020, 14 (01) : 117 - 150
[5] Learning From Demonstrations: A Computationally Efficient Inverse Reinforcement Learning Approach With Simplified Implementation
Lin, Yanbin
Ni, Zhen
Zhong, Xiangnan
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025,
[6] Learning Fairness from Demonstrations via Inverse Reinforcement Learning
Blandin, Jack
Kash, Ian
PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 51 - 61
[7] Efficient Deep Reinforcement Learning via Policy-Extended Successor Feature Approximator
Li, Yining
Yang, Tianpei
Hao, Jianye
Zheng, Yan
Tang, Hongyao
DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2022, 2023, 13824 : 29 - 44
[8] Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
Lionel Blondé
Pablo Strasser
Alexandros Kalousis
Machine Learning, 2022, 111 : 1431 - 1521
[9] Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
Blonde, Lionel
Strasser, Pablo
Kalousis, Alexandros
MACHINE LEARNING, 2022, 111 (04) : 1431 - 1521
[10] Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling
Sosic, Adrian
Zoubir, Abdelhak M.
Rueckert, Elmar
Peters, Jan
Koeppl, Heinz
JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19

← 1 2 3 4 5 →