Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引:5
|
作者
Reymond, Mathieu [1 ]
Hayes, Conor F. [2 ]
Steckelmacher, Denis [1 ]
Roijers, Diederik M. [1 ,3 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Brussels, Belgium
[2] Univ Galway, Galway, Ireland
[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands
关键词
Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;
D O I
10.1007/s10458-023-09604-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
引用
收藏
页数:30
相关论文
共 50 条
  • [11] The Need for MORE: Need Systems as Non-Linear Multi-Objective Reinforcement Learning
    Rolf, Matthias
    10TH IEEE INTERNATIONAL CONFERENCE ON DEVELOPMENT AND LEARNING AND EPIGENETIC ROBOTICS (ICDL-EPIROB 2020), 2020,
  • [12] Actor-Critic Reinforcement Learning for Control With Stability Guarantee
    Han, Minghao
    Zhang, Lixian
    Wang, Jun
    Pan, Wei
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) : 6217 - 6224
  • [13] Actor-Critic for Multi-Agent Reinforcement Learning with Self-Attention
    Zhao, Juan
    Zhu, Tong
    Xiao, Shuo
    Gao, Zongqian
    Sun, Hao
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2022, 36 (09)
  • [14] MULTI-STEP ACTOR-CRITIC FRAMEWORK FOR REINFORCEMENT LEARNING IN CONTINUOUS CONTROL
    Huang T.
    Chen G.
    Journal of Applied and Numerical Optimization, 2023, 5 (02): : 189 - 200
  • [15] MARS: Malleable Actor-Critic Reinforcement Learning Scheduler
    Baheri, Betis
    Tronge, Jacob
    Fang, Bo
    Li, Ang
    Chaudhary, Vipin
    Guan, Qiang
    2022 IEEE INTERNATIONAL PERFORMANCE, COMPUTING, AND COMMUNICATIONS CONFERENCE, IPCCC, 2022,
  • [16] Intensive versus non-intensive actor-critic reinforcement learning algorithms
    Wawrzynski, P
    Pacut, A
    ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING - ICAISC 2004, 2004, 3070 : 934 - 941
  • [17] Forward Actor-Critic for Nonlinear Function Approximation in Reinforcement Learning
    Veeriah, Vivek
    van Seijen, Harm
    Sutton, Richard S.
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 556 - 564
  • [18] Manipulator Motion Planning based on Actor-Critic Reinforcement Learning
    Li, Qiang
    Nie, Jun
    Wang, Haixia
    Lu, Xiao
    Song, Shibin
    2021 PROCEEDINGS OF THE 40TH CHINESE CONTROL CONFERENCE (CCC), 2021, : 4248 - 4254
  • [19] Evaluating Correctness of Reinforcement Learning based on Actor-Critic Algorithm
    Kim, Youngjae
    Hussain, Manzoor
    Suh, Jae-Won
    Hong, Jang-Eui
    2022 THIRTEENTH INTERNATIONAL CONFERENCE ON UBIQUITOUS AND FUTURE NETWORKS (ICUFN), 2022, : 320 - 325
  • [20] Asymmetric Actor-Critic for Adapting to Changing Environments in Reinforcement Learning
    Yue, Wangyang
    Zhou, Yuan
    Zhang, Xiaochuan
    Hua, Yuchen
    Li, Minne
    Fan, Zunlin
    Wang, Zhiyuan
    Kou, Guang
    ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING-ICANN 2024, PT IV, 2024, 15019 : 325 - 339