Multi-objective ω-Regular Reinforcement Learning

被引:3
|
作者
Hahn, Ernst Moritz [1 ]
Perez, Mateo [2 ]
Schewe, Sven [3 ,4 ]
Somenzi, Fabio
Trivedi, Ashutosh [2 ]
Wojtczak, Dominik [3 ]
机构
[1] Univ Twente, Fac Elect Engn Math & Comp Sci, Enschede, Netherlands
[2] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA
[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England
[4] Univ Colorado Boulder, Dept Elect Comp & Energy Engn, Boulder, CO USA
基金
欧盟地平线“2020”; 英国工程与自然科学研究理事会; 美国国家科学基金会;
关键词
Multi-objective reinforcement learning; omega-regular objectives; lexicographic preference; weighted preference; automata-theoretic reinforcement learning; MARKOV DECISION-PROCESSES; STOCHASTIC GAMES; MODEL CHECKING; DOPAMINE; LEVEL;
D O I
10.1145/3605950
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The expanding role of reinforcement learning (RL) in safety-critical system design has promoted omega-automata as a way to express learning requirements-often non-Markovian-with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple omega-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple omega-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, MUNGOJERRIE, and we present an experimental evaluation of our technique on benchmark learning problems.
引用
收藏
页数:24
相关论文
共 50 条
  • [31] Hypervolume-Based Multi-Objective Reinforcement Learning
    Van Moffaert, Kristof
    Drugan, Madalina M.
    Nowe, Ann
    EVOLUTIONARY MULTI-CRITERION OPTIMIZATION, EMO 2013, 2013, 7811 : 352 - 366
  • [32] A practical guide to multi-objective reinforcement learning and planning
    Conor F. Hayes
    Roxana Rădulescu
    Eugenio Bargiacchi
    Johan Källström
    Matthew Macfarlane
    Mathieu Reymond
    Timothy Verstraeten
    Luisa M. Zintgraf
    Richard Dazeley
    Fredrik Heintz
    Enda Howley
    Athirai A. Irissappane
    Patrick Mannion
    Ann Nowé
    Gabriel Ramos
    Marcello Restelli
    Peter Vamplew
    Diederik M. Roijers
    Autonomous Agents and Multi-Agent Systems, 2022, 36
  • [33] Incremental reinforcement learning for multi-objective robotic tasks
    Garcia, Javier
    Iglesias, Roberto
    Rodriguez, Miguel A.
    Regueiro, Carlos V.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2017, 51 (03) : 911 - 940
  • [34] Incremental reinforcement learning for multi-objective robotic tasks
    Javier García
    Roberto Iglesias
    Miguel A. Rodríguez
    Carlos V. Regueiro
    Knowledge and Information Systems, 2017, 51 : 911 - 940
  • [35] Adaptive Objective Selection for Correlated Objectives in Multi-Objective Reinforcement Learning
    Brys, Tim
    Van Moffaert, Kristof
    Nowe, Ann
    Taylor, Matthew E.
    AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1349 - 1350
  • [36] Prediction Guided Meta-Learning for Multi-Objective Reinforcement Learning
    Liu, Fei-Yu
    Qian, Chao
    2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021), 2021, : 2171 - 2178
  • [37] Learning adversarial attack policies through multi-objective reinforcement learning
    Garcia, Javier
    Majadas, Ruben
    Fernandez, Fernando
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2020, 96
  • [38] Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning
    Kim, Man-Je
    Park, Hyunsoo
    Ahn, Chang Wook
    ELECTRONICS, 2022, 11 (07)
  • [39] LEARNING MULTI-OBJECTIVE DECEPTION IN A TWO-PLAYER DIFFERENTIAL GAME USING REINFORCEMENT LEARNING AND MULTI-OBJECTIVE GENETIC ALGORITHM
    Asgharnia A.
    Schwartz H.
    Atia M.
    International Journal of Innovative Computing, Information and Control, 2022, 18 (06): : 1667 - 1688
  • [40] Multi-Objective Reinforcement Learning Based on Decomposition: A Taxonomy and Framework
    Felten, Florian
    Talbi, El-Ghazali
    Danoy, Gregoire
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 79 : 679 - 723