Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

被引:0
|
作者
Alegre, Lucas N. [1 ,2 ]
Bazzan, Ana L. C. [1 ]
Nowe, Ann [2 ]
da Silva, Bruno C. [3 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Vrije Univ Brussel, Artificial Intelligence Lab, Brussels, Belgium
[3] Univ Massachusetts, Amherst, MA 01003 USA
基金
巴西圣保罗研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies-each solving a particular task-and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive. We introduce h-GPI, a multi-step extension of GPI that interpolates between these extremes-standard model-free GPI and fully model-based planning-as a function of a parameter, h, regulating the amount of time the agent has to reason. We prove that h-GPI's performance lower bound is strictly better than GPI's, and show that h-GPI generally outperforms GPI as h increases. Furthermore, we prove that as h increases, h-GPI's performance becomes arbitrarily less susceptible to sub-optimality in the agent's policy library. Finally, we introduce novel bounds characterizing the gains achievable by h-GPI as a function of approximation errors in both the agent's policy library and its (possibly learned) model. These bounds strictly generalize those known in the literature. We evaluate h-GPI on challenging tabular and continuous-state problems under value function approximation and show that it consistently outperforms GPI and state-of-the-art competing methods under various levels of approximation errors.
引用
收藏
页数:25
相关论文
共 50 条
  • [21] Multi-step estimators and shrinkage effect in time series models
    Ivan Svetunkov
    Nikolaos Kourentzes
    Rebecca Killick
    Computational Statistics, 2024, 39 : 1203 - 1239
  • [22] A note on multi-step forecasting with functional coefficient autoregressive models
    Harvill, JL
    Ray, BK
    INTERNATIONAL JOURNAL OF FORECASTING, 2005, 21 (04) : 717 - 727
  • [23] Multi-step method for matrix condensation of finite element models
    Shanghai Jiao Tong Univ, Shanghai, China
    J Sound Vib, 5 (965-971):
  • [24] Improving Multi-Step Prediction of Learned Time Series Models
    Venkatraman, Arun
    Hebert, Martial
    Bagnell, J. Andrew
    PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3024 - 3030
  • [25] Improvement of gate oxide reliability in polysilicon TFT by multi-step deposition
    Chae, J. H.
    Kim, E.
    Jung, Y. S.
    Kim, J. I.
    Yang, M. S.
    Kim, C. D.
    ASID'04: Proceedings of the 8th Asian Symposium on Information Display, 2004, : 576 - 579
  • [26] Improvement of luminescence properties of InN by optimization of multi-step deposition on sapphire
    Mickevicius, J.
    Dobrovolskas, D.
    Malinauskas, T.
    Kolenda, M.
    Kadys, A.
    Tamulaitis, G.
    THIN SOLID FILMS, 2019, 680 : 89 - 93
  • [27] A generalized pattern matching approach for multi-step prediction of crude oil price
    Fan, Ying
    Liang, Qiang
    Wei, Yi-Ming
    ENERGY ECONOMICS, 2008, 30 (03) : 889 - 904
  • [28] Incorporation of statistical methods in multi-step neural network prediction models
    Cloarec, GM
    Ringwood, J
    IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2513 - 2518
  • [29] Multi-step Generation of Bayesian Networks Models for Software Projects Estimations
    Raquel Fuentetaja
    Daniel Borrajo
    Carlos Linares López
    Jorge Ocón
    International Journal of Computational Intelligence Systems, 2013, 6 : 796 - 821
  • [30] Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
    Hou, Yifan
    Li, Jiaoda
    Fei, Yu
    Stolfo, Alessandro
    Zhou, Wangchunshu
    Zeng, Guangtao
    Bosselut, Antoine
    Sachan, Mrinmaya
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4902 - 4919