Information Directed Reward Learning for Reinforcement Learning

被引:0
|
作者
Lindner, David [1 ]
Turchetta, Matteo [1 ]
Tschiatschek, Sebastian [2 ]
Ciosek, Kamil [3 ]
Krause, Andreas [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Univ Vienna, Dept Comp Sci, Vienna, Austria
[3] Spotify, Stockholm, Sweden
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] Distributional Reward Decomposition for Reinforcement Learning
    Lin, Zichuan
    Zhao, Li
    Yang, Derek
    Qin, Tao
    Yang, Guangwen
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [22] Reward learning: Reinforcement, incentives, and expectations
    Berridge, KC
    PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278
  • [23] Reward shaping using directed graph convolution neural networks for reinforcement learning and games
    Sang, Jianghui
    Ahmad Khan, Zaki
    Yin, Hengfu
    Wang, Yupeng
    FRONTIERS IN PHYSICS, 2023, 11
  • [24] Learning reward machines: A study in partially observable reinforcement learning 
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Castro, Margarita P.
    Waldie, Ethan
    Mcilraith, Sheila A.
    ARTIFICIAL INTELLIGENCE, 2023, 323
  • [25] Reward Learning for Efficient Reinforcement Learning in Extractive Document Summarisation
    Gao, Yang
    Meyer, Christian M.
    Mesgar, Mohsen
    Gurevych, Iryna
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2350 - 2356
  • [26] Feasible Q-Learning for Average Reward Reinforcement Learning
    Jin, Ying
    Blanchet, Jose
    Gummadi, Ramki
    Zhou, Zhengyuan
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [27] Historical Information Stability based Reward for Reinforcement Learning in Continuous Integration Testing
    Cao, Tiange
    Li, Zheng
    Zhao, Ruilian
    Yang, Yang
    2021 IEEE 21ST INTERNATIONAL CONFERENCE ON SOFTWARE QUALITY, RELIABILITY AND SECURITY (QRS 2021), 2021, : 231 - 242
  • [28] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte R.T.
    Klassen T.Q.
    Valenzano R.
    McIlraith S.A.
    Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [29] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
  • [30] Reward function and initial values: Better choices for accelerated goal-directed reinforcement learning
    Matignon, Laetitia
    Laurent, Guillaume J.
    Le Fort-Piat, Nadine
    ARTIFICIAL NEURAL NETWORKS - ICANN 2006, PT 1, 2006, 4131 : 840 - 849