Near-Optimal Model-Free Reinforcement Learning in Non-Stationary Episodic MDPs

被引:0
|
作者
Mao, Weichao [1 ,2 ]
Zhang, Kaiqing [1 ,2 ]
Zhu, Ruihao [3 ]
Simchi-Levi, David [3 ]
Basar, Tamer [1 ,2 ]
机构
[1] Univ Illinois, Dept Elect & Comp Engn, Urbana, IL 61801 USA
[2] Univ Illinois, Coordinated Sci Lab, Urbana, IL 61801 USA
[3] MIT, Inst Data Syst & Soc, 77 Massachusetts Ave, Cambridge, MA 02139 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider model-free reinforcement learning (RL) in non-stationary Markov decision processes. Both the reward functions and the state transition functions are allowed to vary arbitrarily over time as long as their cumulative variations do not exceed certain variation budgets. We propose Restarted Q-Learning with Upper Confidence Bounds (RestartQ-UCB), the first model-free algorithm for non-stationary RL, and show that it outperforms existing solutions in terms of dynamic regret. Specifically, RestartQ-UCB with Freedman-type bonus terms achieves a dynamic regret bound of (O) over tilde (S-1/3 A(1/3) Delta(1/3) HT2/3), where S and A are the numbers of states and actions, respectively, Delta > 0 is the variation budget, H is the number of time steps per episode, and T is the total number of time steps. We further show that our algorithm is nearly optimal by establishing an information-theoretical lower bound of Omega (S-1/3 A(1/3) Delta(1/3) HT2/3 T-2/3), the first lower bound in non-stationary RL. Numerical experiments validate the advantages of RestartQ-UCB in terms of both cumulative rewards and computational efficiency. We further demonstrate the power of our results in the context of multi-agent RL, where non-stationarity is a key challenge.
引用
收藏
页数:12
相关论文
共 50 条
  • [1] A hybrid model-free approach for the near-optimal intrusion response control of non-stationary systems
    Iannucci, Stefano
    Cardellini, Valeria
    Barba, Ovidiu Daniel
    Banicescu, Ioana
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2020, 109 : 111 - 124
  • [2] Near-optimal Reinforcement Learning in Factored MDPs
    Osband, Ian
    Van Roy, Benjamin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 27 (NIPS 2014), 2014, 27
  • [3] Near-Optimal Goal-Oriented Reinforcement Learning in Non-Stationary Environments
    Chen, Liyu
    Luo, Haipeng
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [4] Model-free Reinforcement Learning for Non-stationary Mean Field Games
    Mishra, Rajesh K.
    Vasal, Deepanshu
    Vishwanath, Sriram
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 1032 - 1037
  • [5] Model-Free Nonstationary Reinforcement Learning: Near-Optimal Regret and Applications in Multiagent Reinforcement Learning and Inventory Control
    Mao, Weichao
    Zhang, Kaiqing
    Zhu, Ruihao
    Simchi-Levi, David
    Basar, Tamer
    MANAGEMENT SCIENCE, 2024,
  • [6] Reinforcement learning in episodic non-stationary Markovian environments
    Choi, SPM
    Zhang, NL
    Yeung, DY
    IC-AI '04 & MLMTA'04 , VOL 1 AND 2, PROCEEDINGS, 2004, : 752 - 758
  • [7] A Model-free Reinforcement Learning Approach for the Energetic Control of a Building with Non-stationary User Behaviour
    Haddam, Nassim
    Boulakia, Benjamin Cohen
    Barth, Dominique
    2020 THE 4TH INTERNATIONAL CONFERENCE ON SMART GRID AND SMART CITIES (ICSGSC 2020), 2020, : 168 - 177
  • [8] Deriving a Near-optimal Power Management Policy Using Model-Free Reinforcement Learning and Bayesian Classification
    Wang, Yanzhi
    Xie, Qing
    Ammari, Ahmed
    Pedram, Massoud
    PROCEEDINGS OF THE 48TH ACM/EDAC/IEEE DESIGN AUTOMATION CONFERENCE (DAC), 2011, : 41 - 46
  • [9] Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning
    Rana, Rupal
    Oliveira, Fernando S.
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2014, 47 : 116 - 126
  • [10] Non-stationary Risk-Sensitive Reinforcement Learning: Near-Optimal Dynamic Regret, Adaptive Detection, and Separation Design
    Ding, Yuhao
    Jin, Ming
    Lavaei, Javad
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 7405 - 7413