OPIRL: Sample Efficient Off-Policy Inverse Reinforcement Learning via Distribution Matching

被引：0

作者：

Hoshino, Hana ^{[1
]}

Ota, Kei ^{[1
,2
]}

Kanezaki, Asako ^{[1
]}

Yokota, Rio ^{[3
]}

机构：

[1] Tokyo Inst Technol, Dept Comp Sci, Sch Comp, Tokyo, Japan

[2] Mitsubishi Electr Corp, Informat Technol R&D Ctr, Tokyo, Japan

[3] Tokyo Inst Technol, Global Sci Informat & Comp Ctr, Tokyo, Japan

来源：

2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022) | 2022年

关键词：

Imitation Learning; Transfer Learning; Learning from Demonstration; Inverse Reinforcement Learning;

D O I：

10.1109/ICRA.46639.2022.9811660

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Inverse Reinforcement Learning (IRL) is attractive in scenarios where reward engineering can be tedious. However, prior IRL algorithms use on-policy transitions, which require intensive sampling from the current policy for stable and optimal performance. This limits IRL applications in the real world, where environment interactions can become highly expensive. To tackle this problem, we present Off-Policy Inverse Reinforcement Learning (OPIRL), which (1) adopts off-policy data distribution instead of on-policy and enables significant reduction of the number of interactions with the environment, (2) learns a reward function that is transferable with high generalization capabilities on changing dynamics, and (3) leverages mode-covering behavior for faster convergence. We demonstrate that our method is considerably more sample efficient and generalizes to novel environments through the experiments. Our method achieves better or comparable results on policy performance baselines with significantly fewer interactions. Furthermore, we empirically show that the recovered reward function generalizes to different tasks where prior arts are prone to fail.

引用

页数：7

共 49 条

[21] Fighting Uncertainty with Gradients: Offline Reinforcement Learning via Diffusion Score Matching
Suh, H. J. Terry
Chou, Glen
Dai, Hongkai
Yang, Lujie
Gupta, Abhishek
Tedrake, Russ
CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
[22] Dynamic QoS Prediction With Intelligent Route Estimation Via Inverse Reinforcement Learning
Li, Jiahui
Wu, Hao
He, Qiang
Zhao, Yiji
Wang, Xin
IEEE TRANSACTIONS ON SERVICES COMPUTING, 2024, 17 (02) : 509 - 523
[23] Learning Variable Impedance Control via Inverse Reinforcement Learning for Force-Related Tasks
Zhang, Xiang
Sun, Liting
Kuang, Zhian
Tomizuka, Masayoshi
IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 2225 - 2232
[24] Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning
Wang, Tianyu
Dhiman, Vikas
Atanasov, Nikolay
AUTONOMOUS ROBOTS, 2023, 47 (06) : 809 - 830
[25] Inverse reinforcement learning for autonomous navigation via differentiable semantic mapping and planning
Tianyu Wang
Vikas Dhiman
Nikolay Atanasov
Autonomous Robots, 2023, 47 : 809 - 830
[26] Decision Making for Autonomous Driving via Augmented Adversarial Inverse Reinforcement Learning
Wang, Pin
Liu, Dapeng
Chen, Jiayu
Li, Hanhan
Chan, Ching-Yao
2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 1036 - 1042
[27] Future Trajectory Prediction via RNN and Maximum Margin Inverse Reinforcement Learning
Choi, Dooseop
An, Taeg-Hyun
Ahn, Kyounghwan
Choi, Jeongdan
2018 17TH IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS (ICMLA), 2018, : 125 - 130
[28] Expert-Trajectory-Based Features for Apprenticeship Learning via Inverse Reinforcement Learning for Robotic Manipulation
Naranjo-Campos, Francisco J.
Victores, Juan G.
Balaguer, Carlos
APPLIED SCIENCES-BASEL, 2024, 14 (23):
[29] Efficient Bayesian Policy Reuse With a Scalable Observation Model in Deep Reinforcement Learning
Liu, Jinmei
Wang, Zhi
Chen, Chunlin
Dong, Daoyi
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (10) : 14797 - 14809
[30] Unmanned Ground Vehicle Behavior Decision via Improved Bayesian Inverse Reinforcement Learning
Xing, Wen-zhi
Wang, Zhu-ping
CURRENT TRENDS IN COMPUTER SCIENCE AND MECHANICAL AUTOMATION (CSMA), VOL 2, 2017, : 151 - 161

← 1 2 3 4 5 →