Reward Identification in Inverse Reinforcement Learning

被引:0
|
作者
Kim, Kuno [1 ]
Garg, Shivam [1 ]
Shiragur, Kirankumar [1 ]
Ermon, Stefano [1 ]
机构
[1] Stanford Univ, Dept Comp Sci, Palo Alto, CA 94304 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷
关键词
DYNAMIC DISCRETE-CHOICE; MODELS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the problem of reward identifiability in the context of Inverse Reinforcement Learning (IRL). The reward identifiability question is critical to answer when reasoning about the effectiveness of using Markov Decision Processes (MDPs) as computational models of real world decision makers in order to understand complex decision making behavior and perform counterfactual reasoning. While identifiability has been acknowledged as a fundamental theoretical question in IRL, little is known about the types of MDPs for which rewards are identifiable, or even if there exist such MDPs. In this work, we formalize the reward identification problem in IRL and study how identifiability relates to properties of the MDP model. For deterministic MDP models with the MaxEntRL objective, we prove necessary and sufficient conditions for identifiability. Building on these results, we present efficient algorithms for testing whether or not an MDP model is identifiable.
引用
收藏
页数:10
相关论文
共 50 条
  • [41] Reinforcement Learning with Stochastic Reward Machines
    Corazza, Jan
    Gavran, Ivan
    Neider, Daniel
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436
  • [42] Hierarchical average reward reinforcement learning
    Department of Computing Science, University of Alberta, Edmonton, Alta. T6G 2E8, Canada
    不详
    Journal of Machine Learning Research, 2007, 8 : 2629 - 2669
  • [43] Distributional Reward Decomposition for Reinforcement Learning
    Lin, Zichuan
    Zhao, Li
    Yang, Derek
    Qin, Tao
    Yang, Guangwen
    Liu, Tie-Yan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [44] Reward learning: Reinforcement, incentives, and expectations
    Berridge, KC
    PSYCHOLOGY OF LEARNING AND MOTIVATION: ADVANCES IN RESEARCH AND THEORY, VOL 40, 2001, 40 : 223 - 278
  • [45] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte R.T.
    Klassen T.Q.
    Valenzano R.
    McIlraith S.A.
    Journal of Artificial Intelligence Research, 2022, 73 : 173 - 208
  • [46] Reward Machines: Exploiting Reward Function Structure in Reinforcement Learning
    Icarte, Rodrigo Toro
    Klassen, Toryn Q.
    Valenzano, Richard
    Mcllraith, Sheila A.
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 73 : 173 - 208
  • [47] Repeated Inverse Reinforcement Learning
    Amin, Kareem
    Jiang, Nan
    Singh, Satinder
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [48] Cooperative Inverse Reinforcement Learning
    Hadfield-Menell, Dylan
    Dragan, Anca
    Abbeel, Pieter
    Russell, Stuart
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [49] Misspecification in Inverse Reinforcement Learning
    Skalse, Joar
    Abate, Alessandro
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 12, 2023, : 15136 - 15143
  • [50] Lifelong Inverse Reinforcement Learning
    Mendez, Jorge A.
    Shivkumar, Shashank
    Eaton, Eric
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31