Neural scalarisation for multi-objective inverse reinforcement learning

被引:0
作者
Kishikawa, Daiko [1 ]
Arai, Sachiyo [1 ]
机构
[1] Chiba Univ, Dept Urban Environm Syst, 1-33 Yayoi Cho,Inage Ku, Chiba 2638522, Japan
关键词
Inverse reinforcement learning; learning from demonstration; multi-objective optimization; scalarisation; neural network;
D O I
10.1080/18824889.2023.2194234
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-objective inverse reinforcement learning (MOIRL) extends inverse reinforcement learning (IRL) to multi-objective problems by estimating weights and multi-objective rewards to help retrain and analyse preference-conditioned behaviour. Unlike previous methods using linear scalarisation, we propose a MOIRL method using neural scalarisation. This method comprises four neural networks: weight mapping, reward, scalarisation and weight back-translation. Additionally, we introduce two stabilization techniques for learning the proposed method. Experiments show that the proposed method can estimate appropriate weights and rewards reflecting true multi-objective intentions. Furthermore, the estimated weights and rewards can be used for retraining to reproduce the expert solutions.
引用
收藏
页码:140 / 151
页数:12
相关论文
共 37 条
[1]  
Abbeel P., 2004, Proceedings of the International Conference on Machine Learning (ICML), P1
[2]   Evolving Robust Policy Coverage Sets in Multi-Objective Markov Decision Processes Through Intrinsically Motivated Self-Play [J].
Abdelfattah, Sherif ;
Kasmarik, Kathryn ;
Hu, Jiankun .
FRONTIERS IN NEUROROBOTICS, 2018, 12
[3]  
[Anonymous], 2022, MountainCarContinuous-v0
[4]  
Artetxe M, 2018, Arxiv, DOI arXiv:1710.11041
[5]  
Bojarski M, 2016, Arxiv, DOI [arXiv:1604.07316, DOI 10.48550/ARXIV.1604.07316]
[6]  
BRADLEY RA, 1952, BIOMETRIKA, V39, P324, DOI 10.1093/biomet/39.3-4.324
[7]  
Brown Daniel S., 2019, PR MACH LEARN RES, V97
[8]  
Fu JS, 2018, Arxiv, DOI [arXiv:1710.11248, 10.48550/arXiv.1710.11248]
[9]  
Fujita Y, 2021, J MACH LEARN RES, V22, P1
[10]   Methods for multi-objective optimization: An analysis [J].
Giagkiozis, I. ;
Fleming, P. J. .
INFORMATION SCIENCES, 2015, 293 :338-350