Preference-based reinforcement learning: a formal framework and a policy iteration algorithm

被引:60
作者
Fuernkranz, Johannes [1 ]
Huellermeier, Eyke [2 ]
Cheng, Weiwei [2 ]
Park, Sang-Hyeun [1 ]
机构
[1] Tech Univ Darmstadt, Dept Comp Sci, Darmstadt, Germany
[2] Univ Marburg, Dept Math & Comp Sci, Marburg, Germany
关键词
Reinforcement learning; Preference learning; QUALITATIVE DECISION; UNCERTAINTY; PREDICTION; ADVICE;
D O I
10.1007/s10994-012-5313-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper makes a first step toward the integration of two subfields of machine learning, namely preference learning and reinforcement learning (RL). An important motivation for a preference-based approach to reinforcement learning is the observation that in many real-world domains, numerical feedback signals are not readily available, or are defined arbitrarily in order to satisfy the needs of conventional RL algorithms. Instead, we propose an alternative framework for reinforcement learning, in which qualitative reward signals can be directly used by the learner. The framework may be viewed as a generalization of the conventional RL framework in which only a partial order between policies is required instead of the total order induced by their respective expected long-term reward. Therefore, building on novel methods for preference learning, our general goal is to equip the RL agent with qualitative policy models, such as ranking functions that allow for sorting its available actions from most to least promising, as well as algorithms for learning such models from qualitative feedback. As a proof of concept, we realize a first simple instantiation of this framework that defines preferences based on utilities observed for trajectories. To that end, we build on an existing method for approximate policy iteration based on roll-outs. While this approach is based on the use of classification methods for generalization and policy learning, we make use of a specific type of preference learning method called label ranking. Advantages of preference-based approximate policy iteration are illustrated by means of two case studies.
引用
收藏
页码:123 / 156
页数:34
相关论文
共 50 条
  • [21] Interactive preference analysis: A reinforcement learning framework
    Hu, Xiao
    Kang, Siqin
    Ren, Long
    Zhu, Shaokeng
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 319 (03) : 983 - 998
  • [22] Generalized Policy Iteration-based Reinforcement Learning Algorithm for Optimal Control of Unknown Discrete-time Systems
    Lin, Mingduo
    Zhao, Bo
    Liu, Derong
    Liu, Xi
    Luo, Fangchao
    [J]. PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 3650 - 3655
  • [23] Preference-based Online Learning with Dueling Bandits: A Survey
    Bengs, Viktor
    Busa-Fekete, Robert
    El Mesaoudi-Paul, Adil
    Huellermeier, Eyke
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [24] A Survey of Preference-Based Online Learning with Bandit Algorithms
    Busa-Fekete, Robert
    Huellermeier, Eyke
    [J]. ALGORITHMIC LEARNING THEORY (ALT 2014), 2014, 8776 : 18 - 39
  • [25] Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments
    Zuo, Xuan
    Zhang, Pu
    Li, Hui-Yan
    Liu, Zhun-Ga
    [J]. EVOLVING SYSTEMS, 2024, 15 (05) : 1681 - 1699
  • [26] Scaling policy iteration based reinforcement learning for unknown discrete-time linear systems
    Pang, Zhen
    Tang, Shengda
    Cheng, Jun
    He, Shuping
    [J]. AUTOMATICA, 2025, 176
  • [27] A novel movies recommendation algorithm based on reinforcement learning with DDPG policy
    Zhou, Qiaoling
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT COMPUTING AND CYBERNETICS, 2020, 13 (01) : 67 - 79
  • [28] A Policy-Based Reinforcement Learning Algorithm for Intelligent Train Control
    Zhang M.
    Zhang Q.
    Liu W.
    Zhou B.
    [J]. Tiedao Xuebao/Journal of the China Railway Society, 2020, 42 (01): : 69 - 75
  • [29] A HEART FAILURE PREDICTION ALGORITHM BASED ON IMPROVED REINFORCEMENT LEARNING FRAMEWORK
    Zhang, Yijie
    Yang, Xiangbo
    [J]. JOURNAL OF MECHANICS IN MEDICINE AND BIOLOGY, 2024, 24 (08)
  • [30] Reinforcement Learning Control of a Real Mobile Robot Using Approximate Policy Iteration
    Zhang, Pengchen
    Xu, Xin
    Liu, Chunming
    Yuan, Qiping
    [J]. ADVANCES IN NEURAL NETWORKS - ISNN 2009, PT 3, PROCEEDINGS, 2009, 5553 : 278 - 288