Reward Identification in Inverse Reinforcement Learning

被引：0

作者：

Kim, Kuno ^{[1
]}

Garg, Shivam ^{[1
]}

Shiragur, Kirankumar ^{[1
]}

Ermon, Stefano ^{[1
]}

机构：

[1] Stanford Univ, Dept Comp Sci, Palo Alto, CA 94304 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

关键词：

DYNAMIC DISCRETE-CHOICE; MODELS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.

引用

页数：10

共 50 条

[41] Reinforcement Learning with Stochastic Reward Machines
Corazza, Jan
Gavran, Ivan
Neider, Daniel
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436
[42] Hierarchical average reward reinforcement learning
Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
不详
Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
[43] Distributional Reward Decomposition for Reinforcement Learning
Lin, Zichuan
Zhao, Li
Yang, Derek
Qin, Tao
Yang, Guangwen
Liu, Tie-Yan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[44] Reward learning: Reinforcement, incentives, and expectations
Berridge, KC
PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278
[45] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte R.T.
Klassen T.Q.
Valenzano R.
McIlraith S.A.
Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
[46] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
Icarte, Rodrigo Toro
Klassen, Toryn Q.
Valenzano, Richard
Mcllraith, Sheila A.
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
[47] Repeated Inverse Reinforcement Learning
Amin, Kareem
Jiang, Nan
Singh, Satinder
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[48] Cooperative Inverse Reinforcement Learning
Hadfield-Menell, Dylan
Dragan, Anca
Abbeel, Pieter
Russell, Stuart
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[49] Misspecification in Inverse Reinforcement Learning
Skalse, Joar
Abate, Alessandro
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15136 - 15143
[50] Lifelong Inverse Reinforcement Learning
Mendez, Jorge A.
Shivkumar, Shashank
Eaton, Eric
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31

← 1 2 3 4 5 →