Information Directed Reward Learning for Reinforcement Learning

被引:0
|
作者
Lindner, David [1 ]
Turchetta, Matteo [1 ]
Tschiatschek, Sebastian [2 ]
Ciosek, Kamil [3 ]
Krause, Andreas [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Univ Vienna, Dept Comp Sci, Vienna, Austria
[3] Spotify, Stockholm, Sweden
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types.
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Regret Bounds for Information-Directed Reinforcement Learning
    Hao, Botao
    Lattimore, Tor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [2] Reinforcement learning reward functions for unsupervised learning
    Fyfe, Colin
    Lai, Pei Ling
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
  • [3] Reward Reports for Reinforcement Learning
    Gilbert, Thomas Krendl
    Lambert, Nathan
    Dean, Sarah
    Zick, Tom
    Snoswell, Aaron
    Mehta, Soham
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
  • [4] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [5] EFFICIENT AND STABLE INFORMATION DIRECTED EXPLORATION FOR CONTINUOUS REINFORCEMENT LEARNING
    Chen, Mingzhe
    Xiao, Xi
    Zhang, Wanpeng
    Gao, Xiaotian
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4023 - 4027
  • [6] Actively learning costly reward functions for reinforcement learning
    Eberhard, Andre
    Metni, Houssam
    Fahland, Georg
    Stroh, Alexander
    Friederich, Pascal
    MACHINE LEARNING-SCIENCE AND TECHNOLOGY, 2024, 5 (01):
  • [7] Learning classifier system with average reward reinforcement learning
    Zang, Zhaoxiang
    Li, Dehua
    Wang, Junying
    Xia, Dan
    KNOWLEDGE-BASED SYSTEMS, 2013, 40 : 58 - 71
  • [8] Reinforcement Learning for Data Preparation with Active Reward Learning
    Berti-Equille, Laure
    INTERNET SCIENCE, INSCI 2019, 2019, 11938 : 121 - 132
  • [9] Active Learning for Reward Estimation in Inverse Reinforcement Learning
    Lopes, Manuel
    Melo, Francisco
    Montesano, Luis
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT II, 2009, 5782 : 31 - +
  • [10] Learning Reward Machines for Partially Observable Reinforcement Learning
    Icarte, Rodrigo Toro
    Waldie, Ethan
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    McIlraith, Sheila A.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32