Actor-critic multi-objective reinforcement learning for non-linear utility functions

被引:5
|
作者
Reymond, Mathieu [1 ]
Hayes, Conor F. [2 ]
Steckelmacher, Denis [1 ]
Roijers, Diederik M. [1 ,3 ]
Nowe, Ann [1 ]
机构
[1] Vrije Univ Brussel, Brussels, Belgium
[2] Univ Galway, Galway, Ireland
[3] HU Univ Appl Sci Utrecht, Utrecht, Netherlands
关键词
Reinforcement learning; Multi-objective reinforcement learning; Non-linear utility functions; Expected scalarized return; SETS;
D O I
10.1007/s10458-023-09604-x
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a novel multi-objective reinforcement learning algorithm that successfully learns the optimal policy even for non-linear utility functions. Non-linear utility functions pose a challenge for SOTA approaches, both in terms of learning efficiency as well as the solution concept. A key insight is that, by proposing a critic that learns a multi-variate distribution over the returns, which is then combined with accumulated rewards, we can directly optimize on the utility function, even if it is non-linear. This allows us to vastly increase the range of problems that can be solved compared to those which can be handled by single-objective methods or multi-objective methods requiring linear utility functions, yet avoiding the need to learn the full Pareto front. We demonstrate our method on multiple multi-objective benchmarks, and show that it learns effectively where baseline approaches fail.
引用
收藏
页数:30
相关论文
共 50 条
  • [21] An Actor-Critic Reinforcement Learning Control Approach for Discrete-Time Linear System with Uncertainty
    Chen, Hsin-Chang
    Lin, Yu-Chen
    Chang, Yu-Heng
    2018 INTERNATIONAL AUTOMATIC CONTROL CONFERENCE (CACS), 2018,
  • [22] Lexicographic Actor-Critic Deep Reinforcement Learning for Urban Autonomous Driving
    Zhang, Hengrui
    Lin, Youfang
    Han, Sheng
    Lv, Kai
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (04) : 4308 - 4319
  • [23] Influences of Reinforcement and Choice Histories on Choice Behavior in Actor-Critic Learning
    Katahira K.
    Kimura K.
    Computational Brain & Behavior, 2023, 6 (2) : 172 - 194
  • [24] An Actor-Critic Reinforcement Learning Approach for Energy Harvesting Communications Systems
    Masadeh, Ala'eddin
    Wang, Zhengdao
    Kamal, Ahmed E.
    2019 28TH INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION AND NETWORKS (ICCCN), 2019,
  • [25] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Kumar, Harshat
    Koppel, Alec
    Ribeiro, Alejandro
    MACHINE LEARNING, 2023, 112 (07) : 2433 - 2467
  • [26] Automated State Feature Learning for Actor-Critic Reinforcement Learning through NEAT
    Peng, Yiming
    Chen, Gang
    Holdaway, Scott
    Mei, Yi
    Zhang, Mengjie
    PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION (GECCO'17 COMPANION), 2017, : 135 - 136
  • [27] Actor-Critic Reinforcement Learning for Automatic Left Atrial Appendage Segmentation
    Abdullah, Al Walid
    Yun, Il Dong
    PROCEEDINGS 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2018, : 609 - 612
  • [28] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
    Qizhou Sun
    Yain-Whar Si
    Applied Intelligence, 2023, 53 : 16875 - 16892
  • [29] On the sample complexity of actor-critic method for reinforcement learning with function approximation
    Harshat Kumar
    Alec Koppel
    Alejandro Ribeiro
    Machine Learning, 2023, 112 : 2433 - 2467
  • [30] USING ACTOR-CRITIC REINFORCEMENT LEARNING FOR CONTROL AND FLIGHT FORMATION OF QUADROTORS
    Torres, Edgar
    Xu, Lei
    Sardarmehni, Tohid
    PROCEEDINGS OF ASME 2022 INTERNATIONAL MECHANICAL ENGINEERING CONGRESS AND EXPOSITION, IMECE2022, VOL 5, 2022,