Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

被引:0
|
作者
Mao, Weichao [1 ,2 ]
Zhang, Kaiqing [1 ,2 ]
Zhu, Ruihao [3 ]
Simchi-Levi, David [3 ]
Basar, Tamer [1 ,2 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[3] MIT, Inst Data Syst & Soc, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of (O) over tilde (S-1/3 A(1/3) Delta(1/3) HT2/3), where S and A are the numbers of states and actions, respectively, Delta > 0 is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further show that our algorithm is nearly optimal by establishing an information-theoretical lower bound of Omega (S-1/3 A(1/3) Delta(1/3) HT2/3 T-2/3), the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We further demonstrate the power of our results in the context of multi-agent RL, where non-stationarity is a key challenge.
引用
收藏
页数:12
相关论文
共 50 条
  • [41] Selecting Near-Optimal Approximate State Representations in Reinforcement Learning
    Ortner, Ronald
    Maillard, Odalric-Ambrym
    Ryabko, Daniil
    Algorithmic Learning Theory (ALT 2014), 2014, 8776 : 140 - 154
  • [42] Model-free PAC Time-Optimal Control Synthesis with Reinforcement Learning
    Liu, Mengyu
    Lu, Pengyuan
    Chen, Xin
    Sokolsky, Oleg
    Lee, Insup
    Kong, Fanxin
    2024 22ND ACM-IEEE INTERNATIONAL SYMPOSIUM ON FORMAL METHODS AND MODELS FOR SYSTEM DESIGN, MEMOCODE 2024, 2024, : 34 - 45
  • [43] Learning Representations in Model-Free Hierarchical Reinforcement Learning
    Rafati, Jacob
    Noelle, David C.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10009 - 10010
  • [44] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [45] Model-Free Active Exploration in Reinforcement Learning
    Russo, Alessio
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [46] Model-Free Quantum Control with Reinforcement Learning
    Sivak, V. V.
    Eickbusch, A.
    Liu, H.
    Royer, B.
    Tsioutsios, I
    Devoret, M. H.
    PHYSICAL REVIEW X, 2022, 12 (01)
  • [47] Online Nonstochastic Model-Free Reinforcement Learning
    Ghai, Udaya
    Gupta, Arushi
    Xia, Wenhan
    Singh, Karan
    Hazan, Elad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [48] Model-Free Reinforcement Learning Algorithms: A Survey
    Calisir, Sinan
    Pehlivanoglu, Meltem Kurt
    2019 27TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2019,
  • [49] Recovering Robustness in Model-Free Reinforcement Learning
    Venkataraman, Harish K.
    Seiler, Peter J.
    2019 AMERICAN CONTROL CONFERENCE (ACC), 2019, : 4210 - 4216
  • [50] Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
    Li, Chang
    de Rijke, Maarten
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2859 - 2865