Multi-objective ω-Regular Reinforcement Learning

被引：3

作者：

Hahn, Ernst Moritz ^{[1
]}

Perez, Mateo ^{[2
]}

Schewe, Sven ^{[3
,4
]}

Somenzi, Fabio

Trivedi, Ashutosh ^{[2
]}

Wojtczak, Dominik ^{[3
]}

机构：

[1] Univ Twente, Fac Elect Engn Math & Comp Sci, Enschede, Netherlands

[2] Univ Colorado Boulder, Dept Comp Sci, Boulder, CO USA

[3] Univ Liverpool, Dept Comp Sci, Liverpool, Merseyside, England

[4] Univ Colorado Boulder, Dept Elect Comp & Energy Engn, Boulder, CO USA

来源：

FORMAL ASPECTS OF COMPUTING | 2023年 / 35卷 / 02期

基金：

欧盟地平线“2020”; 英国工程与自然科学研究理事会; 美国国家科学基金会;

关键词：

Multi-objective reinforcement learning; omega-regular objectives; lexicographic preference; weighted preference; automata-theoretic reinforcement learning; MARKOV DECISION-PROCESSES; STOCHASTIC GAMES; MODEL CHECKING; DOPAMINE; LEVEL;

D O I：

10.1145/3605950

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

The expanding role of reinforcement learning (RL) in safety-critical system design has promoted omega-automata as a way to express learning requirements-often non-Markovian-with greater ease of expression and interpretation than scalar reward signals. However, real-world sequential decision making situations often involve multiple, potentially conflicting, objectives. Two dominant approaches to express relative preferences over multiple objectives are: (1) weighted preference, where the decision maker provides scalar weights for various objectives, and (2) lexicographic preference, where the decision maker provides an order over the objectives such that any amount of satisfaction of a higher-ordered objective is preferable to any amount of a lower-ordered one. In this article, we study and develop RL algorithms to compute optimal strategies in Markov decision processes against multiple omega-regular objectives under weighted and lexicographic preferences. We provide a translation from multiple omega-regular objectives to a scalar reward signal that is both faithful (maximising reward means maximising probability of achieving the objectives under the corresponding preference) and effective (RL quickly converges to optimal strategies). We have implemented the translations in a formal reinforcement learning tool, MUNGOJERRIE, and we present an experimental evaluation of our technique on benchmark learning problems.

引用

页数：24

共 50 条

[41] Multi-objective path planning based on deep reinforcement learning
Xu, Jian
Huang, Fei
Cui, Yunfei
Du, Xue
2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 3273 - 3279
[42] Virtual machine placement based on multi-objective reinforcement learning
Yao Qin
Hua Wang
Shanwen Yi
Xiaole Li
Linbo Zhai
Applied Intelligence, 2020, 50 : 2370 - 2383
[43] Reinforcement learning with multi-objective optimization in targeted drug design
Abbasi, M.
EUROPEAN JOURNAL OF CLINICAL INVESTIGATION, 2021, 51 : 102 - 103
[44] An XCS-based Algorithm for Multi-Objective Reinforcement Learning
Cheng, Xiu
Chen, Gang
Zhang, Mengjie
2016 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2016, : 4007 - 4014
[45] Modular Multi-Objective Deep Reinforcement Learning with Decision Values
Tajmajer, Tomasz
PROCEEDINGS OF THE 2018 FEDERATED CONFERENCE ON COMPUTER SCIENCE AND INFORMATION SYSTEMS (FEDCSIS), 2018, : 85 - 93
[46] Risk-Sensitivity Through Multi-Objective Reinforcement Learning
Van Moffaert, Kristof
Brys, Tim
Nowe, Ann
2015 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2015, : 1746 - 1753
[47] Multi-objective Reinforcement Learning with Path Integral Policy Improvement
Ariizumi, Ryo
Sago, Hayato
Asai, Toru
Azuma, Shun-ichi
2023 62ND ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS, SICE, 2023, : 1418 - 1423
[48] Virtual machine placement based on multi-objective reinforcement learning
Qin, Yao
Wang, Hua
Yi, Shanwen
Li, Xiaole
Zhai, Linbo
APPLIED INTELLIGENCE, 2020, 50 (08) : 2370 - 2383
[49] Enhancing semantics with multi-objective reinforcement learning for video description
Li, Qinyu
Yang, Longyu
Tang, Pengjie
Wang, Hanli
ELECTRONICS LETTERS, 2021, 57 (25) : 977 - 979
[50] Distributional Pareto-Optimal Multi-Objective Reinforcement Learning
Cai, Xin-Qiang
Zhang, Pushi
Zhao, Li
Bian, Jiang
Sugiyama, Masashi
Llorens, Ashley J.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →