Multiple-Step Greedy Policies in Online and Approximate Reinforcement Learning

被引:0
|
作者
Efroni, Yonathan [1 ]
Dalal, Gal [1 ]
Scherrer, Bruno [2 ]
Mannor, Shie [1 ]
机构
[1] Technion Israel Inst Technol, Dept Elect Engn, Haifa, Israel
[2] INRIA, Villers Les Nancy, France
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multiple-step lookahead policies have demonstrated high empirical competence in Reinforcement Learning, via the use of Monte Carlo Tree Search or Model Predictive Control. In a recent work [5], multiple-step greedy policies and their use in vanilla Policy Iteration algorithms were proposed and analyzed. In this work, we study multiple-step greedy algorithms in more practical setups. We begin by highlighting a counter-intuitive difficulty, arising with soft-policy updates: even in the absence of approximations, and contrary to the 1-step-greedy case, monotonic policy improvement is not guaranteed unless the update stepsize is sufficiently large. Taking particular care about this difficulty, we formulate and analyze online and approximate algorithms that use such a multi-step greedy operator.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Approximate bounding of mixing time for multiple-step Gibbs samplers
    Spade, David
    MONTE CARLO METHODS AND APPLICATIONS, 2022, 28 (03): : 221 - 233
  • [2] Approximate verification of geometric ergodicity for multiple-step Metropolis transition kernels
    Spade, David A.
    HACETTEPE JOURNAL OF MATHEMATICS AND STATISTICS, 2022, 51 (01): : 239 - 252
  • [3] Automatic Generation Control Based on Multiple-step Greedy Attribute and Multiple-level Allocation Strategy
    Xi, Lei
    Zhang, Le
    Xu, Yanchun
    Wang, Shouxiang
    Yang, Chao
    CSEE JOURNAL OF POWER AND ENERGY SYSTEMS, 2022, 8 (01): : 281 - 292
  • [4] Beyond the One-Step Greedy Approach in Reinforcement Learning
    Efroni, Yonathan
    Dalal, Gal
    Scherrer, Bruno
    Mannor, Shie
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [5] Driver model based on reinforced learning with multiple-step state estimation
    Koike, Y
    Doya, K
    ELECTRONICS AND COMMUNICATIONS IN JAPAN PART III-FUNDAMENTAL ELECTRONIC SCIENCE, 2003, 86 (10): : 85 - 95
  • [6] Tight Regret Bounds for Model-Based Reinforcement Learning with Greedy Policies
    Efroni, Yonathan
    Merlis, Nadav
    Ghavamzadeh, Mohammad
    Mannor, Shie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [7] Multiple-step alternatives to chromate passivation
    Bishop, CV
    Loar, G
    1999 AEROSPACE/AIRPLANE PLATING & METAL FINISHING FORUM, 1999, : 107 - 111
  • [8] Multi-step Greedy Reinforcement Learning Based on Model Predictive Control
    Yang, Yucheng
    Lucia, Sergio
    IFAC PAPERSONLINE, 2021, 54 (03): : 699 - 705
  • [9] ESCAPE FROM THE EARTH BY MULTIPLE-STEP ROCKETS
    SUMMERFIELD, M
    PHYSICAL REVIEW, 1947, 71 (04): : 279 - 279
  • [10] Adaptive Combinatorial Maximization: Beyond Approximate Greedy Policies
    Weitzman, Shlomi
    Sabato, Sivan
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237