Objective Weight Interval Estimation Using Adversarial Inverse Reinforcement Learning

被引：1

作者：

Takayama, Naoya ^{[1
]}

Arai, Sachiyo ^{[1
]}

机构：

[1] Chiba Univ, Grad Sch Sci & Engn, Dept Urban Environm Syst, Div Earth & Environm Sci, Chiba 2638522, Japan

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

日本学术振兴会;

关键词：

Estimation; Reinforcement learning; Decision making; Generators; Pareto optimization; Aerospace electronics; Deep learning; Deep reinforcement learning; inverse reinforcement learning; multi-objective planning; preference estimation; sequential decision-making;

D O I：

10.1109/ACCESS.2023.3281593

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Several real-world problems are modeled as multi-objective sequential decision-making problems with multiple competing objectives, and multi-objective reinforcement learning (MORL) has garnered attention as a solution to this problem. One of the challenges in obtaining the desired policy using MORL is that the priorities (hereafter, weights) for each objective must be designed in advance to scalarize the reward vector. Determining weights through trial-and-error burdens system designers, and methods to estimate weights are needed. The existing methods use inverse reinforcement learning (IRL), which is not scalable because it requires reinforcement learning several times until an optimal policy is obtained. This study proposes a weight interval estimation (WInter) method using adversarial IRL (AIRL). AIRL is a scalable framework that reduces the computational complexity of IRL by simultaneously estimating rewards and policies. WInter estimates the weight interval using the expert neighborhoods obtained during AIRL training. We successfully estimated the weight interval through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space while reducing computational complexity compared to the existing methods.

引用

页码：58532 / 58538

页数：7

共 18 条

[1] [Anonymous], 2004, P INT C MACH LEARN I, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
[2] Fu JS, 2018, Arxiv, DOI [arXiv:1710.11248, 10.48550/arXiv.1710.11248]
[3] Generative Adversarial Networks
Goodfellow, Ian
Pouget-Abadie, Jean
Mirza, Mehdi
Xu, Bing
Warde-Farley, David
Ozair, Sherjil
Courville, Aaron
Bengio, Yoshua
[J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
[4] Haarnoja T, 2018, PR MACH LEARN RES, V80
[5] Ikenaga A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), P117, DOI 10.1109/AGENTS.2018.8460075
[6] Gulrajani I, 2017, ADV NEUR IN, V30
[7] Kingma DP, 2014, ADV NEUR IN, V27
[8] Kishikawa D., 2021, P 10 INT C ADV APPL
[9] Kishikawa D, 2022, 2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), P122, DOI 10.23919/SICE56594.2022.9905799
[10] Kostrikov I, 2018, Arxiv, DOI arXiv:1809.02925

← 1 2 →