Objective Weight Interval Estimation Using Adversarial Inverse Reinforcement Learning

被引:1
作者
Takayama, Naoya [1 ]
Arai, Sachiyo [1 ]
机构
[1] Chiba Univ, Grad Sch Sci & Engn, Dept Urban Environm Syst, Div Earth & Environm Sci, Chiba 2638522, Japan
基金
日本学术振兴会;
关键词
Estimation; Reinforcement learning; Decision making; Generators; Pareto optimization; Aerospace electronics; Deep learning; Deep reinforcement learning; inverse reinforcement learning; multi-objective planning; preference estimation; sequential decision-making;
D O I
10.1109/ACCESS.2023.3281593
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Several real-world problems are modeled as multi-objective sequential decision-making problems with multiple competing objectives, and multi-objective reinforcement learning (MORL) has garnered attention as a solution to this problem. One of the challenges in obtaining the desired policy using MORL is that the priorities (hereafter, weights) for each objective must be designed in advance to scalarize the reward vector. Determining weights through trial-and-error burdens system designers, and methods to estimate weights are needed. The existing methods use inverse reinforcement learning (IRL), which is not scalable because it requires reinforcement learning several times until an optimal policy is obtained. This study proposes a weight interval estimation (WInter) method using adversarial IRL (AIRL). AIRL is a scalable framework that reduces the computational complexity of IRL by simultaneously estimating rewards and policies. WInter estimates the weight interval using the expert neighborhoods obtained during AIRL training. We successfully estimated the weight interval through experiments in a benchmark environment for multi-objective sequential decision-making problems in a continuous state space while reducing computational complexity compared to the existing methods.
引用
收藏
页码:58532 / 58538
页数:7
相关论文
共 18 条
  • [1] [Anonymous], 2004, P INT C MACH LEARN I, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
  • [2] Fu JS, 2018, Arxiv, DOI [arXiv:1710.11248, 10.48550/arXiv.1710.11248]
  • [3] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144
  • [4] Haarnoja T, 2018, PR MACH LEARN RES, V80
  • [5] Ikenaga A, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON AGENTS (ICA), P117, DOI 10.1109/AGENTS.2018.8460075
  • [6] Gulrajani I, 2017, ADV NEUR IN, V30
  • [7] Kingma DP, 2014, ADV NEUR IN, V27
  • [8] Kishikawa D., 2021, P 10 INT C ADV APPL
  • [9] Kishikawa D, 2022, 2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), P122, DOI 10.23919/SICE56594.2022.9905799
  • [10] Kostrikov I, 2018, Arxiv, DOI arXiv:1809.02925