Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

被引:0
作者
Kim, Woo Kyung [1 ]
Yoo, Minjong [1 ]
Woo, Honguk [1 ]
机构
[1] Sungkyunkwan Univ, Dept Comp Sci & Engn, Suwon, South Korea
来源
PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年
基金
新加坡国家研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.
引用
收藏
页码:4300 / 4307
页数:8
相关论文
共 18 条
[1]  
Dosovitskiy A, 2017, PR MACH LEARN RES, V78
[2]  
Fu Justin, 2018, P 6 INT C LEARNING
[3]  
Garg D, 2021, ADV NEUR IN, V34
[4]  
Gleave Adam, 2020, P 8 INT C LEARNING
[5]  
Ho Jonathan, 2016, P 30 C NEURAL INFOR
[6]  
Ho Jonathan, 2022, Classifer-Free Diffusion Guidance
[7]  
Jonathan HO, 2020, P 34 C NEURAL INFOR
[8]  
Kishikawa Daiko, 2021, 2021 10th International Congress on Advanced Applied Informatics (IIAI-AAI), P452, DOI 10.1109/IIAI-AAI53430.2021.00078
[9]  
Panagiotis Kyriakis, 2022, P 10 INT C LEARNI
[10]  
Shafullah Nur Muhammad, 2022, ADV NEUR IN