Reinforcement Learning with a Corrupted Reward Channel

被引:0
|
作者
Everitt, Tom [1 ]
Krakovna, Victoria [2 ]
Orseau, Laurent [2 ]
Legg, Shane [2 ]
机构
[1] Australian Natl Univ, Canberra, ACT, Australia
[2] DeepMind, London, England
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
No real-world reward function is perfect. Sensory errors and software bugs may result in agents getting higher (or lower) rewards than they should. For example, a reinforcement learning agent may prefer states where a sensory error gives it the maximum reward, but where the true reward is actually small. We formalise this problem as a generalised Markov Decision Problem called Corrupt Reward MDP. Traditional RL methods fare poorly in CRMDPs, even under strong simplifying assumptions and when trying to compensate for the possibly corrupt rewards. Two ways around the problem are investigated. First, by giving the agent richer data, such as in inverse reinforcement learning and semi-supervised reinforcement learning, reward corruption stemming from systematic sensory errors may sometimes be completely managed. Second, by using randomisation to blunt the agent's optimisation, reward corruption can be partially managed under some assumptions.
引用
收藏
页码:4705 / 4713
页数:9
相关论文
共 50 条
  • [1] Reward Reports for Reinforcement Learning
    Gilbert, Thomas Krendl
    Lambert, Nathan
    Dean, Sarah
    Zick, Tom
    Snoswell, Aaron
    Mehta, Soham
    PROCEEDINGS OF THE 2023 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, AIES 2023, 2023, : 84 - 130
  • [2] Reward, motivation, and reinforcement learning
    Dayan, P
    Balleine, BW
    NEURON, 2002, 36 (02) : 285 - 298
  • [3] Information Directed Reward Learning for Reinforcement Learning
    Lindner, David
    Turchetta, Matteo
    Tschiatschek, Sebastian
    Ciosek, Kamil
    Krause, Andreas
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Reinforcement learning reward functions for unsupervised learning
    Fyfe, Colin
    Lai, Pei Ling
    ADVANCES IN NEURAL NETWORKS - ISNN 2007, PT 1, PROCEEDINGS, 2007, 4491 : 397 - +
  • [5] Belief Reward Shaping in Reinforcement Learning
    Marom, Ofir
    Rosman, Benjamin
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 3762 - 3769
  • [6] Multigrid Reinforcement Learning with Reward Shaping
    Grzes, Marek
    Kudenko, Daniel
    ARTIFICIAL NEURAL NETWORKS - ICANN 2008, PT I, 2008, 5163 : 357 - 366
  • [7] Reward Shaping in Episodic Reinforcement Learning
    Grzes, Marek
    AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, : 565 - 573
  • [8] Hybrid Reward Architecture for Reinforcement Learning
    van Seijen, Harm
    Fatemi, Mehdi
    Romoff, Joshua
    Laroche, Romain
    Barnes, Tavian
    Tsang, Jeffrey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [9] Hierarchical average reward reinforcement learning
    Ghavamzadeh, Mohammad
    Mahadevan, Sridhar
    JOURNAL OF MACHINE LEARNING RESEARCH, 2007, 8 : 2629 - 2669
  • [10] Reinforcement Learning with Stochastic Reward Machines
    Corazza, Jan
    Gavran, Ivan
    Neider, Daniel
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / THE TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 6429 - 6436