Corruption-Robust Exploration in Episodic Reinforcement Learning

被引:0
|
作者
Lykouris, Thodoris [1 ]
Simchowitz, Max [2 ]
Slivkins, Aleksandrs [3 ]
Sun, Wen [4 ]
机构
[1] MIT, Sloan Sch Management, Cambridge, MA 02139 USA
[2] MIT, Dept Elect Engn & Comp Sci, Cambridge, MA 02139 USA
[3] Microsoft Res Lab, New York, NY 10012 USA
[4] Cornell Univ, Dept Comp Sci, Ithaca, NY 14850 USA
关键词
reinforcement learning; bandit feedback; exploration; robustness; regret; MULTIARMED BANDIT;
D O I
10.1287/moor.2021.0202
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We initiate the study of episodic reinforcement learning (RL) under adversarial corruptions in both the rewards and the transition probabilities of the underlying system, extending recent results for the special case of multiarmed bandits. We provide a framework that modifies the aggressive exploration enjoyed by existing reinforcement learning approaches based on optimism in the face of uncertainty by complementing them with principles from action elimination. Importantly, our framework circumvents the major challenges posed by naively applying action elimination in the RL setting, as formalized by a lower bound we demonstrate. Our framework yields efficient algorithms that (a) attain near -optimal regret in the absence of corruptions and (b) adapt to unknown levels of corruption, enjoying regret guarantees that degrade gracefully in the total corruption encountered. To showcase the generality of our approach, we derive results for both tabular settings (where states and actions are finite) and linear Markov decision process settings (where the dynamics and rewards admit a linear underlying representation). Notably, our work provides the first sublinear regret guarantee that accommodates any deviation from purely independent and identically distributed transitions in the bandit -feedback model for episodic reinforcement learning.
引用
收藏
页数:29
相关论文
共 50 条
  • [21] Robust Federated Learning with Realistic Corruption
    Zhao, Puning
    Wu, Jiafei
    Liu, Zhe
    WEB AND BIG DATA, APWEB-WAIM 2024, PT IV, 2024, 14964 : 228 - 242
  • [22] Structured-Anomaly Pursuit of Network Traffic via Corruption-Robust Low-Rank Tensor Decomposition
    Zeng, Jiuzhen
    Yang, Laurence T.
    Wang, Chao
    Ruan, Yiheng
    Zhu, Chenlu
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2024, 11 (03): : 2510 - 2523
  • [23] Exploration Entropy for Reinforcement Learning
    Xin, Bo
    Yu, Haixu
    Qin, You
    Tang, Qing
    Zhu, Zhangqing
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2020, 2020
  • [24] Exploration in Structured Reinforcement Learning
    Ok, Jungseul
    Proutiere, Alexandre
    Tranos, Damianos
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [25] Exploration and Incentives in Reinforcement Learning
    Simchowitz, Max
    Slivkins, Aleksandrs
    OPERATIONS RESEARCH, 2024, 72 (03) : 983 - 998
  • [26] Deep Reinforcement Learning with Parametric Episodic Memory
    Chen, Kangkang
    Gan, Zhongxue
    Leng, Siyang
    Guan, Chun
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [27] Conservative Exploration in Reinforcement Learning
    Garcelon, Evrard
    Ghavamzadeh, Mohammad
    Lazaric, Alessandro
    Pirotta, Matteo
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 1431 - 1440
  • [28] Latent Exploration for Reinforcement Learning
    Chiappa, Alberto Silvio
    Vargas, Alessandro Marin
    Huang, Ann Zixiang
    Mathis, Alexander
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [29] Steady State Analysis of Episodic Reinforcement Learning
    Huang Bojun
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [30] Detecting Rewards Deterioration in Episodic Reinforcement Learning
    Greenberg, Ido
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139