Information Directed Reward Learning for Reinforcement Learning

被引:0
|
作者
Lindner, David [1 ]
Turchetta, Matteo [1 ]
Tschiatschek, Sebastian [2 ]
Ciosek, Kamil [3 ]
Krause, Andreas [1 ]
机构
[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland
[2] Univ Vienna, Dept Comp Sci, Vienna, Austria
[3] Spotify, Stockholm, Sweden
关键词
REGRET;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types.
引用
收藏
页数:13
相关论文
共 50 条
  • [31] Reinforcement learning and the reward positivity with aversive outcomes
    Bauer, Elizabeth A.
    Watanabe, Brandon K.
    Macnamara, Annmarie
    PSYCHOPHYSIOLOGY, 2024, 61 (04)
  • [32] Reward Certification for Policy Smoothed Reinforcement Learning
    Mu, Ronghui
    Marcolino, Leandro Soriano
    Zhang, Yanghao
    Zhang, Tianle
    Huang, Xiaowei
    Ruan, Wenjie
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 19, 2024, : 21429 - 21437
  • [33] Reinforcement Learning in Reward-Mixing MDPs
    Kwon, Jeongyeol
    Efroni, Yonathan
    Caramanis, Constantine
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [34] Explicable Reward Design for Reinforcement Learning Agents
    Devidze, Rati
    Radanovic, Goran
    Kamalaruban, Parameswaran
    Singla, Adish
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] Robust Average-Reward Reinforcement Learning
    Wang, Yue
    Velasquez, Alvaro
    Atia, George
    Prater-Bennette, Ashley
    Zou, Shaofeng
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2024, 80 : 719 - 803
  • [36] Skill Reward for Safe Deep Reinforcement Learning
    Cheng, Jiangchang
    Yu, Fumin
    Zhang, Hongliang
    Dai, Yinglong
    UBIQUITOUS SECURITY, 2022, 1557 : 203 - 213
  • [37] On the Power of Global Reward Signals in Reinforcement Learning
    Kemmerich, Thomas
    Buening, Hans Kleine
    MULTIAGENT SYSTEM TECHNOLOGIES, 2011, 6973 : 53 - +
  • [38] Option compatible reward inverse reinforcement learning
    Hwang, Rakhoon
    Lee, Hanjin
    Hwang, Hyung Ju
    PATTERN RECOGNITION LETTERS, 2022, 154 : 83 - 89
  • [39] DISCRIMINATION OF REWARD IN LEARNING WITH PARTIAL AND CONTINUOUS REINFORCEMENT
    HULSE, SH
    JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 1962, 64 (03): : 227 - &
  • [40] Evolution of an Internal Reward Function for Reinforcement Learning
    Zuo, Weiyi
    Pedersen, Joachim Winther
    Risi, Sebastian
    PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, : 351 - 354