Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

被引：0

作者：

Alegre, Lucas N. ^{[1
,2
]}

Bazzan, Ana L. C. ^{[1
]}

Nowe, Ann ^{[2
]}

da Silva, Bruno C. ^{[3
]}

机构：

[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil

[2] Vrije Univ Brussel, Artificial Intelligence Lab, Brussels, Belgium

[3] Univ Massachusetts, Amherst, MA 01003 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023) | 2023年

基金：

巴西圣保罗研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies-each solving a particular task-and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive. We introduce h-GPI, a multi-step extension of GPI that interpolates between these extremes-standard model-free GPI and fully model-based planning-as a function of a parameter, h, regulating the amount of time the agent has to reason. We prove that h-GPI's performance lower bound is strictly better than GPI's, and show that h-GPI generally outperforms GPI as h increases. Furthermore, we prove that as h increases, h-GPI's performance becomes arbitrarily less susceptible to sub-optimality in the agent's policy library. Finally, we introduce novel bounds characterizing the gains achievable by h-GPI as a function of approximation errors in both the agent's policy library and its (possibly learned) model. These bounds strictly generalize those known in the literature. We evaluate h-GPI on challenging tabular and continuous-state problems under value function approximation and show that it consistently outperforms GPI and state-of-the-art competing methods under various levels of approximation errors.

引用

页数：25

共 50 条

[21] Multi-step estimators and shrinkage effect in time series models
Ivan Svetunkov
Nikolaos Kourentzes
Rebecca Killick
Computational Statistics, 2024, 39 : 1203 - 1239
[22] A note on multi-step forecasting with functional coefficient autoregressive models
Harvill, JL
Ray, BK
INTERNATIONAL JOURNAL OF FORECASTING, 2005, 21 (04) : 717 - 727
[23] Multi-step method for matrix condensation of finite element models
Shanghai Jiao Tong Univ, Shanghai, China
J Sound Vib, 5 (965-971):
[24] Improving Multi-Step Prediction of Learned Time Series Models
Venkatraman, Arun
Hebert, Martial
Bagnell, J. Andrew
PROCEEDINGS OF THE TWENTY-NINTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2015, : 3024 - 3030
[25] Improvement of gate oxide reliability in polysilicon TFT by multi-step deposition
Chae, J. H.
Kim, E.
Jung, Y. S.
Kim, J. I.
Yang, M. S.
Kim, C. D.
ASID'04: Proceedings of the 8th Asian Symposium on Information Display, 2004, : 576 - 579
[26] Improvement of luminescence properties of InN by optimization of multi-step deposition on sapphire
Mickevicius, J.
Dobrovolskas, D.
Malinauskas, T.
Kolenda, M.
Kadys, A.
Tamulaitis, G.
THIN SOLID FILMS, 2019, 680 : 89 - 93
[27] A generalized pattern matching approach for multi-step prediction of crude oil price
Fan, Ying
Liang, Qiang
Wei, Yi-Ming
ENERGY ECONOMICS, 2008, 30 (03) : 889 - 904
[28] Incorporation of statistical methods in multi-step neural network prediction models
Cloarec, GM
Ringwood, J
IEEE WORLD CONGRESS ON COMPUTATIONAL INTELLIGENCE, 1998, : 2513 - 2518
[29] Multi-step Generation of Bayesian Networks Models for Software Projects Estimations
Raquel Fuentetaja
Daniel Borrajo
Carlos Linares López
Jorge Ocón
International Journal of Computational Intelligence Systems, 2013, 6 : 796 - 821
[30] Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Hou, Yifan
Li, Jiaoda
Fei, Yu
Stolfo, Alessandro
Zhou, Wangchunshu
Zeng, Guangtao
Bosselut, Antoine
Sachan, Mrinmaya
2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 4902 - 4919

← 1 2 3 4 5 →