Neural scalarisation for multi-objective inverse reinforcement learning

被引:0
作者
Kishikawa, Daiko [1 ]
Arai, Sachiyo [1 ]
机构
[1] Chiba Univ, Dept Urban Environm Syst, 1-33 Yayoi Cho,Inage Ku, Chiba 2638522, Japan
关键词
Inverse reinforcement learning; learning from demonstration; multi-objective optimization; scalarisation; neural network;
D O I
10.1080/18824889.2023.2194234
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Multi-objective inverse reinforcement learning (MOIRL) extends inverse reinforcement learning (IRL) to multi-objective problems by estimating weights and multi-objective rewards to help retrain and analyse preference-conditioned behaviour. Unlike previous methods using linear scalarisation, we propose a MOIRL method using neural scalarisation. This method comprises four neural networks: weight mapping, reward, scalarisation and weight back-translation. Additionally, we introduce two stabilization techniques for learning the proposed method. Experiments show that the proposed method can estimate appropriate weights and rewards reflecting true multi-objective intentions. Furthermore, the estimated weights and rewards can be used for retraining to reproduce the expert solutions.
引用
收藏
页码:140 / 151
页数:12
相关论文
共 37 条
[21]   Reinforcement learning in robotics: A survey [J].
Kober, Jens ;
Bagnell, J. Andrew ;
Peters, Jan .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1238-1274
[22]  
Kreps D.M., 1988, Notes on the Theory of Choice
[23]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[24]   Learning strategies in table tennis using inverse reinforcement learning [J].
Muelling, Katharina ;
Boularias, Abdeslam ;
Mohler, Betty ;
Schoelkopf, Bernhard ;
Peters, Jan .
BIOLOGICAL CYBERNETICS, 2014, 108 (05) :603-619
[25]  
Ng A. Y., 2000, P 17 INT C MACH LEAR, V1, P2
[26]  
Paszke A, 2019, ADV NEUR IN, V32
[27]  
Ramachandran D, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2586
[28]   A Survey of Multi-Objective Sequential Decision-Making [J].
Roijers, Diederik M. ;
Vamplew, Peter ;
Whiteson, Shimon ;
Dazeley, Richard .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 48 :67-113
[29]  
Russell S., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P101, DOI 10.1145/279943.279964
[30]  
Srivastava N, 2014, J MACH LEARN RES, V15, P1929