Corruption-Robust Exploration in Episodic Reinforcement Learning

被引:0
|
作者
Lykouris, Thodoris [1 ]
Simchowitz, Max [2 ]
Slivkins, Aleksandrs [3 ]
Sun, Wen [4 ]
机构
[1] MIT, Sloan Sch Management, Cambridge, MA 02139 USA
[2] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[3] Microsoft Res Lab, New York, NY 10012 USA
[4] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
关键词
reinforcement learning; bandit feedback; exploration; robustness; regret; MULTIARMED BANDIT;
D O I
10.1287/moor.2021.0202
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near -optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit -feedback model for episodic reinforcement learning.
引用
收藏
页数:29
相关论文
共 50 条
  • [1] Corruption-Robust Offline Reinforcement Learning
    Zhang, Xuezhou
    Chen, Yiding
    Zhu, Jerry
    Sun, Wen
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151 : 5757 - 5773
  • [2] Corruption-Robust Offline Reinforcement Learning with General Function Approximation
    Ye, Chenlu
    Yang, Rui
    Gu, Quanquan
    Zhang, Tong
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Improved Corruption Robust Algorithms for Episodic Reinforcement Learning
    Chen, Yifang
    Du, Simon S.
    Jamieson, Kevin
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [4] Corruption-Robust Lipschitz Contextual Search
    Zuo, Shiliang
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [5] Balancing exploration and exploitation in episodic reinforcement learning
    Chen, Qihang
    Zhang, Qiwei
    Liu, Yunlong
    EXPERT SYSTEMS WITH APPLICATIONS, 2023, 231
  • [6] Corruption-Robust Contextual Search through Density Updates
    Leme, Renato Paes
    Podimata, Chara
    Schneider, Jon
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [7] MetaMix: Towards Corruption-Robust Continual Learning with Temporally Self-Adaptive Data Transformation
    Wang, Zhenyi
    Shen, Li
    Zhan, Donglin
    Suo, Qiuling
    Zhu, Yanjun
    Duan, Tiehang
    Gao, Mingchen
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 24521 - 24531
  • [8] A Model Selection Approach for Corruption Robust Reinforcement Learning
    Wei, Chen-Yu
    Dann, Christoph
    Zimmert, Julian
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167, 2022, 167
  • [9] Robust exploration in linear quadratic reinforcement learning
    Umenberger, Jack
    Ferizbegovic, Mina
    Schon, Thomas B.
    Hjalmarsson, Hakan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [10] FSDA: Frequency re-scaling in data augmentation for corruption-robust image classification
    Nam, Ju-Hyeon
    Lee, Sang-Chul
    PATTERN RECOGNITION, 2024, 150