Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引：5

作者：

Reymond, Mathieu ^{[1
]}

Hayes, Conor F. ^{[2
]}

Steckelmacher, Denis ^{[1
]}

Roijers, Diederik M. ^{[1
,3
]}

Nowe, Ann ^{[1
]}

机构：

[1] Vrije Univ Brussel, Brussels, Belgium

[2] Univ Galway, Galway, Ireland

[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2023年 / 37卷 / 02期

关键词：

Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;

D O I：

10.1007/s10458-023-09604-x

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.

引用

页数：30

共 50 条

[11] The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning
Rolf, Matthias
10TH IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB 2020), 2020,
[12] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
Han, Minghao
Zhang, Lixian
Wang, Jun
Pan, Wei
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
[13] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
Zhao, Juan
Zhu, Tong
Xiao, Shuo
Gao, Zongqian
Sun, Hao
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
[14] MULTI-STEP ACTOR-CRITIC FRAMEWORK FOR REINFORCEMENT LEARNING IN CONTINUOUS CONTROL
Huang T.
Chen G.
Journal of Applied and Numerical Optimization, 2023, 5 (02): : 189 - 200
[15] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
Baheri, Betis
Tronge, Jacob
Fang, Bo
Li, Ang
Chaudhary, Vipin
Guan, Qiang
2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
[16] Intensive versus non-intensive actor-critic reinforcement learning algorithms
Wawrzynski, P
Pacut, A
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 934 - 941
[17] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
Veeriah, Vivek
van Seijen, Harm
Sutton, Richard S.
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
[18] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
Li, Qiang
Nie, Jun
Wang, Haixia
Lu, Xiao
Song, Shibin
2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
[19] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
Kim, Youngjae
Hussain, Manzoor
Suh, Jae-Won
Hong, Jang-Eui
2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
[20] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
Yue, Wangyang
Zhou, Yuan
Zhang, Xiaochuan
Hua, Yuchen
Li, Minne
Fan, Zunlin
Wang, Zhiyuan
Kou, Guang
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339

← 1 2 3 4 5 →