Pareto Inverse Reinforcement Learning for Diverse Expert Policy Generation

被引：0

作者：

Kim, Woo Kyung ^{[1
]}

Yoo, Minjong ^{[1
]}

Woo, Honguk ^{[1
]}

机构：

[1] Sungkyunkwan Univ, Dept Comp Sci & Engn, Suwon, South Korea

来源：

PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024 | 2024年

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Data-driven offline reinforcement learning and imitation learning approaches have been gaining popularity in addressing sequential decision-making problems. Yet, these approaches rarely consider learning Pareto-optimal policies from a limited pool of expert datasets. This becomes particularly marked due to practical limitations in obtaining comprehensive datasets for all preferences, where multiple conflicting objectives exist and each expert might hold a unique optimization preference for these objectives. In this paper, we adapt inverse reinforcement learning (IRL) by using reward distance estimates for regularizing the discriminator. This enables progressive generation of a set of policies that accommodate diverse preferences on the multiple objectives, while using only two distinct datasets, each associated with a different expert preference. In doing so, we present a Pareto IRL framework (ParIRL) that establishes a Pareto policy set from these limited datasets. In the framework, the Pareto policy set is then distilled into a single, preference-conditioned diffusion model, thus allowing users to immediately specify which expert's patterns they prefer. Through experiments, we show that ParIRL outperforms other IRL algorithms for various multi-objective control tasks, achieving the dense approximation of the Pareto frontier. We also demonstrate the applicability of ParIRL with autonomous driving in CARLA.

引用

页码：4300 / 4307

页数：8

共 18 条

[1]

Dosovitskiy A, 2017, PR MACH LEARN RES, V78

[2]

Fu Justin, 2018, P 6 INT C LEARNING

[3]

Garg D, 2021, ADV NEUR IN, V34

[4]

Gleave Adam, 2020, P 8 INT C LEARNING

[5]

Ho Jonathan, 2016, P 30 C NEURAL INFOR

[6]

Ho Jonathan, 2022, Classifer-Free Diffusion Guidance

[7]

Jonathan HO, 2020, P 34 C NEURAL INFOR

[8]

Kishikawa Daiko, 2021, 2021 10th International Congress on Advanced Applied Informatics (IIAI-AAI), P452, DOI 10.1109/IIAI-AAI53430.2021.00078

[9]

Panagiotis Kyriakis, 2022, P 10 INT C LEARNI

[10]

Shafullah Nur Muhammad, 2022, ADV NEUR IN

← 1 2 →