Multi-objective ω-Regular Reinforcement Learning

被引:3
|
作者
Hahn, Ernst Moritz [1 ]
Perez, Mateo [2 ]
Schewe, Sven [3 ,4 ]
Somenzi, Fabio
Trivedi, Ashutosh [2 ]
Wojtczak, Dominik [3 ]
机构
[1] Univ Twente, Fac Elect Engn Math & Comp Sci, Enschede, Netherlands
[2] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[4] Univ Colorado Boulder, Dept Elect Comp & Energy Engn, Boulder, CO USA
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
Multi-objective reinforcement learning; omega-regular objectives; lexicographic preference; weighted preference; automata-theoretic reinforcement learning; MARKOV DECISION-PROCESSES; STOCHASTIC GAMES; MODEL CHECKING; DOPAMINE; LEVEL;
D O I
10.1145/3605950
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The expanding role of reinforcement learning (RL) in safety-critical system design has promoted omega-automata as a way to express learning requirements-often non-Markovian-with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple omega-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple omega-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, MUNGOJERRIE, and we present an experimental evaluation of our technique on benchmark learning problems.
引用
收藏
页数:24
相关论文
共 50 条
  • [41] Multi-objective path planning based on deep reinforcement learning
    Xu, Jian
    Huang, Fei
    Cui, Yunfei
    Du, Xue
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3273 - 3279
  • [42] Virtual machine placement based on multi-objective reinforcement learning
    Yao Qin
    Hua Wang
    Shanwen Yi
    Xiaole Li
    Linbo Zhai
    Applied Intelligence, 2020, 50 : 2370 - 2383
  • [43] Reinforcement learning with multi-objective optimization in targeted drug design
    Abbasi, M.
    EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2021, 51 : 102 - 103
  • [44] An XCS-based Algorithm for Multi-Objective Reinforcement Learning
    Cheng, Xiu
    Chen, Gang
    Zhang, Mengjie
    2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 4007 - 4014
  • [45] Modular Multi-Objective Deep Reinforcement Learning with Decision Values
    Tajmajer, Tomasz
    PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, : 85 - 93
  • [46] Risk-Sensitivity Through Multi-Objective Reinforcement Learning
    Van Moffaert, Kristof
    Brys, Tim
    Nowe, Ann
    2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 1746 - 1753
  • [47] Multi-objective Reinforcement Learning with Path Integral Policy Improvement
    Ariizumi, Ryo
    Sago, Hayato
    Asai, Toru
    Azuma, Shun-ichi
    2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1418 - 1423
  • [48] Virtual machine placement based on multi-objective reinforcement learning
    Qin, Yao
    Wang, Hua
    Yi, Shanwen
    Li, Xiaole
    Zhai, Linbo
    APPLIED INTELLIGENCE, 2020, 50 (08) : 2370 - 2383
  • [49] Enhancing semantics with multi-objective reinforcement learning for video description
    Li, Qinyu
    Yang, Longyu
    Tang, Pengjie
    Wang, Hanli
    ELECTRONICS LETTERS, 2021, 57 (25) : 977 - 979
  • [50] Distributional Pareto-Optimal Multi-Objective Reinforcement Learning
    Cai, Xin-Qiang
    Zhang, Pushi
    Zhao, Li
    Bian, Jiang
    Sugiyama, Masashi
    Llorens, Ashley J.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,