Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

被引:60
|
作者
Fuernkranz, Johannes [1 ]
Huellermeier, Eyke [2 ]
Cheng, Weiwei [2 ]
Park, Sang-Hyeun [1 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Univ Marburg, Dept Math & Comp Sci, Marburg, Germany
关键词
Reinforcement learning; Preference learning; QUALITATIVE DECISION; UNCERTAINTY; PREDICTION; ADVICE;
D O I
10.1007/s10994-012-5313-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.
引用
收藏
页码:123 / 156
页数:34
相关论文
共 50 条
  • [1] Preference-based reinforcement learning: a formal framework and a policy iteration algorithm
    Johannes Fürnkranz
    Eyke Hüllermeier
    Weiwei Cheng
    Sang-Hyeun Park
    Machine Learning, 2012, 89 : 123 - 156
  • [2] Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
    Róbert Busa-Fekete
    Balázs Szörényi
    Paul Weng
    Weiwei Cheng
    Eyke Hüllermeier
    Machine Learning, 2014, 97 : 327 - 351
  • [3] Preference-based reinforcement learning: evolutionary direct policy search using a preference-based racing algorithm
    Busa-Fekete, Robert
    Szoerenyi, Balazs
    Weng, Paul
    Cheng, Weiwei
    Huellermeier, Eyke
    MACHINE LEARNING, 2014, 97 (03) : 327 - 351
  • [4] A Survey of Preference-Based Reinforcement Learning Methods
    Wirth, Christian
    Akrour, Riad
    Neumann, Gerhard
    Fuernkranz, Johannes
    JOURNAL OF MACHINE LEARNING RESEARCH, 2017, 18
  • [5] Policy Iteration Reinforcement Learning-based control using a Grey Wolf Optimizer algorithm
    Zamfirache, Iuliu Alexandru
    Precup, Radu-Emil
    Roman, Raul-Cristian
    Petriu, Emil M.
    INFORMATION SCIENCES, 2022, 585 : 162 - 175
  • [6] PrefCLM: Enhancing Preference-Based Reinforcement Learning With Crowdsourced Large Language Models
    Wang, Ruiqi
    Zhao, Dezhong
    Yuan, Ziqin
    Obi, Ike
    Min, Byung-Cheol
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (03): : 2486 - 2493
  • [7] An Iteration Algorithm for American Options Pricing Based on Reinforcement Learning
    Li, Nan
    SYMMETRY-BASEL, 2022, 14 (07):
  • [8] User Preference-Based Demand Response for Smart Home Energy Management Using Multiobjective Reinforcement Learning
    Chen, Song-Jen
    Chiu, Wei-Yu
    Liu, Wei-Jen
    IEEE ACCESS, 2021, 9 : 161627 - 161637
  • [9] Quantum reinforcement learning via policy iteration
    El Amine Cherrat
    Iordanis Kerenidis
    Anupam Prakash
    Quantum Machine Intelligence, 2023, 5