Neural scalarisation for multi-objective inverse reinforcement learning

被引：0

作者：

Kishikawa, Daiko ^{[1
]}

Arai, Sachiyo ^{[1
]}

机构：

[1] Chiba Univ, Dept Urban Environm Syst, 1-33 Yayoi Cho,Inage Ku, Chiba 2638522, Japan

来源：

SICE JOURNAL OF CONTROL MEASUREMENT AND SYSTEM INTEGRATION | 2023年 / 16卷 / 01期

关键词：

Inverse reinforcement learning; learning from demonstration; multi-objective optimization; scalarisation; neural network;

D O I：

10.1080/18824889.2023.2194234

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-objective inverse reinforcement learning (MOIRL) extends inverse reinforcement learning (IRL) to multi-objective problems by estimating weights and multi-objective rewards to help retrain and analyse preference-conditioned behaviour. Unlike previous methods using linear scalarisation, we propose a MOIRL method using neural scalarisation. This method comprises four neural networks: weight mapping, reward, scalarisation and weight back-translation. Additionally, we introduce two stabilization techniques for learning the proposed method. Experiments show that the proposed method can estimate appropriate weights and rewards reflecting true multi-objective intentions. Furthermore, the estimated weights and rewards can be used for retraining to reproduce the expert solutions.

引用

页码：140 / 151

页数：12

共 37 条

[1]

Abbeel P., 2004, Proceedings of the International Conference on Machine Learning (ICML), P1

[2] Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [J].

Abdelfattah, Sherif ;

Kasmarik, Kathryn ;

Hu, Jiankun .

FRONTIERS IN NEUROROBOTICS, 2018, 12

[3]

[Anonymous], 2022, MountainCarContinuous-v0

[4]

Artetxe M, 2018, Arxiv, DOI arXiv:1710.11041

[5]

Bojarski M, 2016, Arxiv, DOI [arXiv:1604.07316, DOI 10.48550/ARXIV.1604.07316]

[6]

BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324

[7]

Brown Daniel S., 2019, PR MACH LEARN RES, V97

[8]

Fu JS, 2018, Arxiv, DOI [arXiv:1710.11248, 10.48550/arXiv.1710.11248]

[9]

Fujita Y, 2021, J MACH LEARN RES, V22, P1

[10] Methods for multi-objective optimization: An analysis [J].

Giagkiozis, I. ;

Fleming, P. J. .

INFORMATION SCIENCES, 2015, 293 :338-350

← 1 2 3 4 →