Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引:5
|
作者
Reymond, Mathieu [1 ]
Hayes, Conor F. [2 ]
Steckelmacher, Denis [1 ]
Roijers, Diederik M. [1 ,3 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Brussels, Belgium
[2] Univ Galway, Galway, Ireland
[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands
关键词
Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;
D O I
10.1007/s10458-023-09604-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
引用
收藏
页数:30
相关论文
共 50 条
  • [1] Actor-critic multi-objective reinforcement learning for non-linear utility functions
    Mathieu Reymond
    Conor F. Hayes
    Denis Steckelmacher
    Diederik M. Roijers
    Ann Nowé
    Autonomous Agents and Multi-Agent Systems, 2023, 37
  • [2] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    INFORMATION SCIENCES, 2023, 647
  • [3] A Prioritized objective actor-critic method for deep reinforcement learning
    Ngoc Duy Nguyen
    Thanh Thi Nguyen
    Peter Vamplew
    Richard Dazeley
    Saeid Nahavandi
    Neural Computing and Applications, 2021, 33 : 10335 - 10349
  • [4] A Prioritized objective actor-critic method for deep reinforcement learning
    Nguyen, Ngoc Duy
    Nguyen, Thanh Thi
    Vamplew, Peter
    Dazeley, Richard
    Nahavandi, Saeid
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (16) : 10335 - 10349
  • [5] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
    Zhu, Lilu
    Wu, Feng
    Hu, Yanfeng
    Huang, Kai
    Tian, Xinmei
    NEURAL COMPUTING & APPLICATIONS, 2023, 35 (13) : 9687 - 9710
  • [6] A heuristic multi-objective task scheduling framework for container-based clouds via actor-critic reinforcement learning
    Lilu Zhu
    Feng Wu
    Yanfeng Hu
    Kai Huang
    Xinmei Tian
    Neural Computing and Applications, 2023, 35 : 9687 - 9710
  • [7] A World Model for Actor-Critic in Reinforcement Learning
    Panov, A. I.
    Ugadiarov, L. A.
    PATTERN RECOGNITION AND IMAGE ANALYSIS, 2023, 33 (03) : 467 - 477
  • [8] A fuzzy Actor-Critic reinforcement learning network
    Wang, Xue-Song
    Cheng, Yu-Hu
    Yi, Jian-Qiang
    INFORMATION SCIENCES, 2007, 177 (18) : 3764 - 3781
  • [9] Research on actor-critic reinforcement learning in RoboCup
    Guo, He
    Liu, Tianying
    Wang, Yuxin
    Chen, Feng
    Fan, Jianming
    WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS, 2006, : 205 - 205
  • [10] A multi-agent reinforcement learning using Actor-Critic methods
    Li, Chun-Gui
    Wang, Meng
    Yuan, Qing-Neng
    PROCEEDINGS OF 2008 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2008, : 878 - 882