OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

被引:0
|
作者
Hoshino, Hana [1 ]
Ota, Kei [1 ,2 ]
Kanezaki, Asako [1 ]
Yokota, Rio [3 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Sch Comp, Tokyo, Japan
[2] Mitsubishi Electr Corp, Informat Technol R&D Ctr, Tokyo, Japan
[3] Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022) | 2022年
关键词
Imitation Learning; Transfer Learning; Learning from Demonstration; Inverse Reinforcement Learning;
D O I
10.1109/ICRA.46639.2022.9811660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.
引用
收藏
页数:7
相关论文
共 49 条
  • [1] Off-Dynamics Inverse Reinforcement Learning
    Kang, Yachen
    Liu, Jinxin
    Wang, Donglin
    IEEE ACCESS, 2024, 12 : 65117 - 65127
  • [2] Methodologies for Imitation Learning via Inverse Reinforcement Learning: A Review
    Zhang K.
    Yu Y.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (02): : 254 - 261
  • [3] Machining sequence learning via inverse reinforcement learning
    Sugisawa, Yasutomo
    Takasugi, Keigo
    Asakawa, Naoki
    PRECISION ENGINEERING-JOURNAL OF THE INTERNATIONAL SOCIETIES FOR PRECISION ENGINEERING AND NANOTECHNOLOGY, 2022, 73 : 477 - 487
  • [4] On the use of the policy gradient and Hessian in inverse reinforcement learning
    Metelli, Alberto Maria
    Pirotta, Matteo
    Restelli, Marcello
    INTELLIGENZA ARTIFICIALE, 2020, 14 (01) : 117 - 150
  • [5] Learning From Demonstrations: A Computationally Efficient Inverse Reinforcement Learning Approach With Simplified Implementation
    Lin, Yanbin
    Ni, Zhen
    Zhong, Xiangnan
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2025,
  • [6] Learning Fairness from Demonstrations via Inverse Reinforcement Learning
    Blandin, Jack
    Kash, Ian
    PROCEEDINGS OF THE 2024 ACM CONFERENCE ON FAIRNESS, ACCOUNTABILITY, AND TRANSPARENCY, ACM FACCT 2024, 2024, : 51 - 61
  • [7] Efficient Deep Reinforcement Learning via Policy-Extended Successor Feature Approximator
    Li, Yining
    Yang, Tianpei
    Hao, Jianye
    Zheng, Yan
    Tang, Hongyao
    DISTRIBUTED ARTIFICIAL INTELLIGENCE, DAI 2022, 2023, 13824 : 29 - 44
  • [8] Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
    Lionel Blondé
    Pablo Strasser
    Alexandros Kalousis
    Machine Learning, 2022, 111 : 1431 - 1521
  • [9] Lipschitzness is all you need to tame off-policy generative adversarial imitation learning
    Blonde, Lionel
    Strasser, Pablo
    Kalousis, Alexandros
    MACHINE LEARNING, 2022, 111 (04) : 1431 - 1521
  • [10] Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling
    Sosic, Adrian
    Zoubir, Abdelhak M.
    Rueckert, Elmar
    Peters, Jan
    Koeppl, Heinz
    JOURNAL OF MACHINE LEARNING RESEARCH, 2018, 19