Information Directed Reward Learning for Reinforcement Learning

被引：0

作者：

Lindner, David ^{[1
]}

Turchetta, Matteo ^{[1
]}

Tschiatschek, Sebastian ^{[2
]}

Ciosek, Kamil ^{[3
]}

Krause, Andreas ^{[1
]}

机构：

[1] Swiss Fed Inst Technol, Dept Comp Sci, Zurich, Switzerland

[2] Univ Vienna, Dept Comp Sci, Vienna, Austria

[3] Spotify, Stockholm, Sweden

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

REGRET;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

For many reinforcement learning (RL) applications, specifying a reward is difficult. This paper considers an RL setting where the agent obtains information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward that allows standard RL algorithms to achieve high expected returns with as few expert queries as possible. To this end, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward and selects queries that maximize the information gain about the difference in return between plausibly optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, it achieves similar or better performance with significantly fewer queries by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model. We support our findings with extensive evaluations in multiple environments and with different query types.

引用

页数：13

共 50 条

[41] Reinforcement learning with nonstationary reward depending on the episode
Shibuya, Takeshi
Yasunobu, Seiji
2011 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2011, : 2145 - 2150
[42] Inverse Reinforcement Learning with the Average Reward Criterion
Wu, Feiyang
Ke, Jingyang
Wu, Anqi
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[43] THE ROLE OF SECONDARY REINFORCEMENT IN DELAYED REWARD LEARNING
SPENCE, KW
PSYCHOLOGICAL REVIEW, 1947, 54 (01) : 1 - 8
[44] Balancing multiple sources of reward in reinforcement learning
Shelton, CR
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 13, 2001, 13 : 1082 - 1088
[45] IMMEDIATE REINFORCEMENT IN DELAYED REWARD LEARNING IN PIGEONS
WINTER, J
PERKINS, CC
JOURNAL OF THE EXPERIMENTAL ANALYSIS OF BEHAVIOR, 1982, 38 (02) : 169 - 179
[46] Evolved Intrinsic Reward Functions for Reinforcement Learning
Niekum, Scott
PROCEEDINGS OF THE TWENTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-10), 2010, : 1955 - 1956
[47] Reward Shaping Based Federated Reinforcement Learning
Hu, Yiqiu
Hua, Yun
Liu, Wenyan
Zhu, Jun
IEEE ACCESS, 2021, 9 : 67259 - 67267
[48] CONDITIONED (SECONDARY) REINFORCEMENT AND DELAYED REWARD LEARNING
PERKINS, CC
BULLETIN OF THE PSYCHONOMIC SOCIETY, 1981, 18 (02) : 57 - 57
[49] Hindsight Reward Shaping in Deep Reinforcement Learning
de Villiers, Byron
Sabatta, Deon
2020 INTERNATIONAL SAUPEC/ROBMECH/PRASA CONFERENCE, 2020, : 653 - 659
[50] Robust Average-Reward Reinforcement Learning
Wang, Yue
Velasquez, Alvaro
Atia, George
Prater-Bennette, Ashley
Zou, Shaofeng
Journal of Artificial Intelligence Research, 2024, 80 : 719 - 803

← 1 2 3 4 5 →