OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

被引:0
作者
Hoshino, Hana [1 ]
Ota, Kei [1 ,2 ]
Kanezaki, Asako [1 ]
Yokota, Rio [3 ]
机构
[1] Tokyo Inst Technol, Dept Comp Sci, Sch Comp, Tokyo, Japan
[2] Mitsubishi Electr Corp, Informat Technol R&D Ctr, Tokyo, Japan
[3] Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan
来源
2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022) | 2022年
关键词
Imitation Learning; Transfer Learning; Learning from Demonstration; Inverse Reinforcement Learning;
D O I
10.1109/ICRA.46639.2022.9811660
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.
引用
收藏
页数:7
相关论文
共 49 条
  • [21] Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
    Suh, H. J. Terry
    Chou, Glen
    Dai, Hongkai
    Yang, Lujie
    Gupta, Abhishek
    Tedrake, Russ
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [22] Dynamic QoS Prediction With Intelligent Route Estimation Via Inverse Reinforcement Learning
    Li, Jiahui
    Wu, Hao
    He, Qiang
    Zhao, Yiji
    Wang, Xin
    IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (02) : 509 - 523
  • [23] Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks
    Zhang, Xiang
    Sun, Liting
    Kuang, Zhian
    Tomizuka, Masayoshi
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 2225 - 2232
  • [24] Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning
    Wang, Tianyu
    Dhiman, Vikas
    Atanasov, Nikolay
    AUTONOMOUS ROBOTS, 2023, 47 (06) : 809 - 830
  • [25] Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning
    Tianyu Wang
    Vikas Dhiman
    Nikolay Atanasov
    Autonomous Robots, 2023, 47 : 809 - 830
  • [26] Decision Making for Autonomous Driving via Augmented Adversarial Inverse Reinforcement Learning
    Wang, Pin
    Liu, Dapeng
    Chen, Jiayu
    Li, Hanhan
    Chan, Ching-Yao
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1036 - 1042
  • [27] Future Trajectory Prediction via RNN and Maximum Margin Inverse Reinforcement Learning
    Choi, Dooseop
    An, Taeg-Hyun
    Ahn, Kyounghwan
    Choi, Jeongdan
    2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 125 - 130
  • [28] Expert-Trajectory-Based Features for Apprenticeship Learning via Inverse Reinforcement Learning for Robotic Manipulation
    Naranjo-Campos, Francisco J.
    Victores, Juan G.
    Balaguer, Carlos
    APPLIED SCIENCES-BASEL, 2024, 14 (23):
  • [29] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
    Liu, Jinmei
    Wang, Zhi
    Chen, Chunlin
    Dong, Daoyi
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
  • [30] Unmanned Ground Vehicle Behavior Decision via Improved Bayesian Inverse Reinforcement Learning
    Xing, Wen-zhi
    Wang, Zhu-ping
    CURRENT TRENDS IN COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), VOL 2, 2017, : 151 - 161