Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

被引：0

作者：

Villaflor, Adam ^{[1
]}

Huang, Zhe ^{[1
]}

Pande, Swapnil ^{[1
]}

Dolan, John ^{[1
]}

Schneider, Jeff ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to disentangle the effects of the policy and world dynamics on the return. Thus, in adversarial or stochastic environments, these methods lead to overly optimistic behavior that can be dangerous in safety-critical systems like autonomous driving. In this work, we propose a method that addresses this optimism bias by explicitly disentangling the policy and world models, which allows us at test time to search for policies that are robust to multiple possible futures in the environment. We demonstrate our method's superior performance on a variety of autonomous driving tasks in simulation.

引用

页数：14

共 50 条

[1] Addressing Hindsight Bias in Multigoal Reinforcement Learning
Bai, Chenjia
Wang, Lingxiao
Wang, Yixin
Wang, Zhaoran
Zhao, Rui
Bai, Chenyao
Liu, Peng
IEEE TRANSACTIONS ON CYBERNETICS, 2023, 53 (01) : 392 - 405
[2] Addressing maximization bias in reinforcement learning with two-sample testing
Waltz, Martin
Okhrin, Ostap
ARTIFICIAL INTELLIGENCE, 2024, 336
[3] Decision Transformer: Reinforcement Learning via Sequence Modeling
Chen, Lili
Lu, Kevin
Rajeswaran, Aravind
Lee, Kimin
Grover, Aditya
Laskin, Michael
Abbeel, Pieter
Srinivas, Aravind
Mordatch, Igor
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[4] Optimizing Attention for Sequence Modeling via Reinforcement Learning
Fei, Hao
Zhang, Yue
Ren, Yafeng
Ji, Donghong
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (08) : 3612 - 3621
[5] Addressing Sample Efficiency and Model-bias in Model-based Reinforcement Learning
Anand, Akhil S.
Kveen, Jens Erik
Abu-Dakka, Fares
Grotli, Esten Ingar
Gravdahl, Jan Tommy
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA, 2022, : 1 - 6
[6] Adversarial learning with optimism for bias reduction in machine learning
Yu-Chen Cheng
Po-An Chen
Feng-Chi Chen
Ya-Wen Cheng
AI and Ethics, 2024, 4 (4): : 1389 - 1402
[7] Offline Reinforcement Learning as One Big Sequence Modeling Problem
Janner, Michael
Li, Qiyang
Levine, Sergey
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
[8] Multi-Agent Reinforcement Learning is A Sequence Modeling Problem
Wen, Muning
Kuba, Jakub Grudzien
Lin, Runji
Zhang, Weinan
Wen, Ying
Wang, Jun
Yang, Yaodong
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[9] Anomalous State Sequence Modeling to Enhance Safety in Reinforcement Learning
Kweider, Leen
Abou Kassem, Maissa
Sandouk, Ubai
IEEE ACCESS, 2024, 12 : 157140 - 157148
[10] Rationality, Optimism and Guarantees in General Reinforcement Learning
Sunehag, Peter
Hutter, Marcus
JOURNAL OF MACHINE LEARNING RESEARCH, 2015, 16 : 1345 - 1390

← 1 2 3 4 5 →