Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

被引:0
|
作者
Alegre, Lucas N. [1 ,2 ]
Bazzan, Ana L. C. [1 ]
Nowe, Ann [2 ]
da Silva, Bruno C. [3 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Vrije Univ Brussel, Artificial Intelligence Lab, Brussels, Belgium
[3] Univ Massachusetts, Amherst, MA 01003 USA
基金
巴西圣保罗研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies-each solving a particular task-and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive. We introduce h-GPI, a multi-step extension of GPI that interpolates between these extremes-standard model-free GPI and fully model-based planning-as a function of a parameter, h, regulating the amount of time the agent has to reason. We prove that h-GPI's performance lower bound is strictly better than GPI's, and show that h-GPI generally outperforms GPI as h increases. Furthermore, we prove that as h increases, h-GPI's performance becomes arbitrarily less susceptible to sub-optimality in the agent's policy library. Finally, we introduce novel bounds characterizing the gains achievable by h-GPI as a function of approximation errors in both the agent's policy library and its (possibly learned) model. These bounds strictly generalize those known in the literature. We evaluate h-GPI on challenging tabular and continuous-state problems under value function approximation and show that it consistently outperforms GPI and state-of-the-art competing methods under various levels of approximation errors.
引用
收藏
页数:25
相关论文
共 50 条
  • [1] A multi-step procedure to determine the number of factors in large approximate factor models
    Luo, Ronghua
    Jiang, Jiakun
    Lan, Wei
    Yan, Chengliang
    Ding, Yue
    COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 2021, 50 (17) : 3988 - 3999
  • [2] Multi-step H∞ generalized predictive control
    Grimble, MJ
    DYNAMICS AND CONTROL, 1998, 8 (04) : 303 - 339
  • [3] Multi-step Training of a Generalized Linear Classifier
    Tyagi, Kanishka
    Manry, Michael
    NEURAL PROCESSING LETTERS, 2019, 50 (02) : 1341 - 1360
  • [4] Multi-step Training of a Generalized Linear Classifier
    Kanishka Tyagi
    Michael Manry
    Neural Processing Letters, 2019, 50 : 1341 - 1360
  • [5] Understanding world models through multi-step pruning policy via reinforcement learning
    He, Zhiqiang
    Qiu, Wen
    Zhao, Wei
    Shao, Xun
    Liu, Zhi
    INFORMATION SCIENCES, 2025, 686
  • [6] Improvement of multi-step brittle-plastic approach
    Jin J.-C.
    Jing L.-H.
    Yang F.-W.
    Song Z.-Y.
    Shang P.-Y.
    Zhejiang Daxue Xuebao (Gongxue Ban)/Journal of Zhejiang University (Engineering Science), 2023, 57 (09): : 1706 - 1717
  • [7] Transcranial Image Quality Improvement with a Multi-step Approach
    Vignon, Francois
    Shi, William
    Shamdasani, Vijay
    Kalman, Paul
    Maxwell, Doug
    Powers, Jeffry
    2013 IEEE INTERNATIONAL ULTRASONICS SYMPOSIUM (IUS), 2013, : 1276 - 1279
  • [8] A generalized feature projection scheme for multi-step traffic forecasting
    Zeb, Adnan
    Zhang, Shiyao
    Wei, Xuetao
    Yu, James Jianqiao
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244
  • [9] Generalized Convergence for Multi-Step Schemes under Weak Conditions
    Behl, Ramandeep
    Argyros, Ioannis K.
    Alshehri, Hashim
    Regmi, Samundra
    MATHEMATICS, 2024, 12 (02)
  • [10] Generalized rational multi-step method for delay differential equations
    Vinci Shaalini, J.
    Emimal Kanaga Pushpam, A.
    IAENG International Journal of Applied Mathematics, 2020, 50 (01) : 87 - 95