Multi-objective ω-Regular Reinforcement Learning

被引:3
|
作者
Hahn, Ernst Moritz [1 ]
Perez, Mateo [2 ]
Schewe, Sven [3 ,4 ]
Somenzi, Fabio
Trivedi, Ashutosh [2 ]
Wojtczak, Dominik [3 ]
机构
[1] Univ Twente, Fac Elect Engn Math & Comp Sci, Enschede, Netherlands
[2] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[4] Univ Colorado Boulder, Dept Elect Comp & Energy Engn, Boulder, CO USA
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
Multi-objective reinforcement learning; omega-regular objectives; lexicographic preference; weighted preference; automata-theoretic reinforcement learning; MARKOV DECISION-PROCESSES; STOCHASTIC GAMES; MODEL CHECKING; DOPAMINE; LEVEL;
D O I
10.1145/3605950
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The expanding role of reinforcement learning (RL) in safety-critical system design has promoted omega-automata as a way to express learning requirements-often non-Markovian-with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple omega-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple omega-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, MUNGOJERRIE, and we present an experimental evaluation of our technique on benchmark learning problems.
引用
收藏
页数:24
相关论文
共 50 条
  • [1] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Horie, Naoto
    Matsui, Tohgoroh
    Moriyama, Koichi
    Mutoh, Atsuko
    Inuzuka, Nobuhiro
    ARTIFICIAL LIFE AND ROBOTICS, 2019, 24 (03) : 352 - 359
  • [2] Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning
    Naoto Horie
    Tohgoroh Matsui
    Koichi Moriyama
    Atsuko Mutoh
    Nobuhiro Inuzuka
    Artificial Life and Robotics, 2019, 24 : 352 - 359
  • [3] Federated multi-objective reinforcement learning
    Zhao, Fangyuan
    Ren, Xuebin
    Yang, Shusen
    Zhao, Peng
    Zhang, Rui
    Xu, Xinxin
    INFORMATION SCIENCES, 2023, 624 : 811 - 832
  • [4] Multi-Objective Optimisation by Reinforcement Learning
    Liao, H. L.
    Wu, Q. H.
    2010 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2010,
  • [5] Meta-Learning for Multi-objective Reinforcement Learning
    Chen, Xi
    Ghadirzadeh, Ali
    Bjorkman, Marten
    Jensfelt, Patric
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 977 - 983
  • [6] Multi-objective Reinforcement Learning for Responsive Grids
    Perez, Julien
    Germain-Renaud, Cecile
    Kegl, Balazs
    Loomis, Charles
    JOURNAL OF GRID COMPUTING, 2010, 8 (03) : 473 - 492
  • [7] Special issue on multi-objective reinforcement learning
    Drugan, Madalina
    Wiering, Marco
    Vamplew, Peter
    Chetty, Madhu
    NEUROCOMPUTING, 2017, 263 : 1 - 2
  • [8] A multi-objective deep reinforcement learning framework
    Thanh Thi Nguyen
    Ngoc Duy Nguyen
    Vamplew, Peter
    Nahavandi, Saeid
    Dazeley, Richard
    Lim, Chee Peng
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [9] A Constrained Multi-Objective Reinforcement Learning Framework
    Huang, Sandy H.
    Abdolmaleki, Abbas
    Vezzani, Giulia
    Brakel, Philemon
    Mankowitz, Daniel J.
    Neunert, Michael
    Bohez, Steven
    Tassa, Yuval
    Heess, Nicolas
    Riedmiller, Martin
    Hadsell, Raia
    CONFERENCE ON ROBOT LEARNING, VOL 164, 2021, 164 : 883 - 893
  • [10] Multi-objective Reinforcement Learning for Responsive Grids
    Julien Perez
    Cécile Germain-Renaud
    Balazs Kégl
    Charles Loomis
    Journal of Grid Computing, 2010, 8 : 473 - 492