Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

被引:0
|
作者
Villaflor, Adam [1 ]
Huang, Zhe [1 ]
Pande, Swapnil [1 ]
Dolan, John [1 ]
Schneider, Jeff [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Addressing Hindsight Bias in Multigoal Reinforcement Learning
    Bai, Chenjia
    Wang, Lingxiao
    Wang, Yixin
    Wang, Zhaoran
    Zhao, Rui
    Bai, Chenyao
    Liu, Peng
    IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (01) : 392 - 405
  • [2] Addressing maximization bias in reinforcement learning with two-sample testing
    Waltz, Martin
    Okhrin, Ostap
    ARTIFICIAL INTELLIGENCE, 2024, 336
  • [3] Decision Transformer: Reinforcement Learning via Sequence Modeling
    Chen, Lili
    Lu, Kevin
    Rajeswaran, Aravind
    Lee, Kimin
    Grover, Aditya
    Laskin, Michael
    Abbeel, Pieter
    Srinivas, Aravind
    Mordatch, Igor
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [4] Optimizing Attention for Sequence Modeling via Reinforcement Learning
    Fei, Hao
    Zhang, Yue
    Ren, Yafeng
    Ji, Donghong
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3612 - 3621
  • [5] Addressing Sample Efficiency and Model-bias in Model-based Reinforcement Learning
    Anand, Akhil S.
    Kveen, Jens Erik
    Abu-Dakka, Fares
    Grotli, Esten Ingar
    Gravdahl, Jan Tommy
    2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1 - 6
  • [6] Adversarial learning with optimism for bias reduction in machine learning
    Yu-Chen Cheng
    Po-An Chen
    Feng-Chi Chen
    Ya-Wen Cheng
    AI and Ethics, 2024, 4 (4): : 1389 - 1402
  • [7] Offline Reinforcement Learning as One Big Sequence Modeling Problem
    Janner, Michael
    Li, Qiyang
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [8] Multi-Agent Reinforcement Learning is A Sequence Modeling Problem
    Wen, Muning
    Kuba, Jakub Grudzien
    Lin, Runji
    Zhang, Weinan
    Wen, Ying
    Wang, Jun
    Yang, Yaodong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [9] Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning
    Kweider, Leen
    Abou Kassem, Maissa
    Sandouk, Ubai
    IEEE ACCESS, 2024, 12 : 157140 - 157148
  • [10] Rationality, Optimism and Guarantees in General Reinforcement Learning
    Sunehag, Peter
    Hutter, Marcus
    JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1345 - 1390