Multi-objective deep inverse reinforcement learning for weight estimation of objectives

被引：0

作者：

Naoya Takayama

Sachiyo Arai

机构：

[1] Chiba University,Division of Earth and Environmental Sciences, Department of Urban Environment Systems, Graduate School of Science and Engineering

来源：

Artificial Life and Robotics | 2022年 / 27卷

关键词：

Sequential decision-making; Multi-objective planning; Preference estimation; Inverse reinforcement learning; Deep reinforcement learning;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Weight is a parameter used for measuring the priority in multi-objective reinforcement learning when linearly scalarizing the reward vector for each objective. The weights need to be set in advance; however, most real-world problems have numerous objectives. Therefore, adjusting the weights requires many trials and errors by the designer. In addition, a method to automatically estimate weights is needed to reduce the burden on designers to set weights. In this paper, we propose a novel method for estimating the weights based on the reward vector for each objective and the expert trajectories using the framework of inverse reinforcement learning (IRL). In particular, we adopt deep IRL with deep reinforcement learning and multiplicative weights apprenticeship learning for fast weight estimation in a continuous state space. Through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space, we verified that our novel weight estimation method is superior to the projection method and Bayesian optimization.

引用

页码：594 / 602

页数：8

共 6 条

[1] Vamplew P(2011)Empirical evaluation methods for multiobjective reinforcement learning algorithms Mach Learn 84 51-80
[2] Dazeley R(1987)The analytic hierarchy process-what it is and how it is used Math Model 9 161-176
[3] Berry A(undefined)undefined undefined undefined undefined-undefined
[4] Issabekov R(undefined)undefined undefined undefined undefined-undefined
[5] Dekker E(undefined)undefined undefined undefined undefined-undefined
[6] Saaty RW(undefined)undefined undefined undefined undefined-undefined

← 1 →