Neural scalarisation for multi-objective inverse reinforcement learning

被引：0

作者：

Kishikawa, Daiko ^{[1
]}

Arai, Sachiyo ^{[1
]}

机构：

[1] Chiba Univ, Dept Urban Environm Syst, 1-33 Yayoi Cho,Inage Ku, Chiba 2638522, Japan

来源：

SICE JOURNAL OF CONTROL MEASUREMENT AND SYSTEM INTEGRATION | 2023年 / 16卷 / 01期

关键词：

Inverse reinforcement learning; learning from demonstration; multi-objective optimization; scalarisation; neural network;

D O I：

10.1080/18824889.2023.2194234

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-objective inverse reinforcement learning (MOIRL) extends inverse reinforcement learning (IRL) to multi-objective problems by estimating weights and multi-objective rewards to help retrain and analyse preference-conditioned behaviour. Unlike previous methods using linear scalarisation, we propose a MOIRL method using neural scalarisation. This method comprises four neural networks: weight mapping, reward, scalarisation and weight back-translation. Additionally, we introduce two stabilization techniques for learning the proposed method. Experiments show that the proposed method can estimate appropriate weights and rewards reflecting true multi-objective intentions. Furthermore, the estimated weights and rewards can be used for retraining to reproduce the expert solutions.

引用

页码：140 / 151

页数：12

共 37 条

[21] Reinforcement learning in robotics: A survey [J].

Kober, Jens ;

Bagnell, J. Andrew ;

Peters, Jan .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1238-1274

[22]

Kreps D.M., 1988, Notes on the Theory of Choice

[23] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

[24] Learning strategies in table tennis using inverse reinforcement learning [J].

Muelling, Katharina ;

Boularias, Abdeslam ;

Mohler, Betty ;

Schoelkopf, Bernhard ;

Peters, Jan .

BIOLOGICAL CYBERNETICS, 2014, 108 (05) :603-619

[25]

Ng A. Y., 2000, P 17 INT C MACH LEAR, V1, P2

[26]

Paszke A, 2019, ADV NEUR IN, V32

[27]

Ramachandran D, 2007, 20TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2586

[28] A Survey of Multi-Objective Sequential Decision-Making [J].

Roijers, Diederik M. ;

Vamplew, Peter ;

Whiteson, Shimon ;

Dazeley, Richard .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 48 :67-113

[29]

Russell S., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P101, DOI 10.1145/279943.279964

[30]

Srivastava N, 2014, J MACH LEARN RES, V15, P1929

← 1 2 3 4 →