Multi-Step Generalized Policy Improvement by Leveraging Approximate Models

被引:0
|
作者
Alegre, Lucas N. [1 ,2 ]
Bazzan, Ana L. C. [1 ]
Nowe, Ann [2 ]
da Silva, Bruno C. [3 ]
机构
[1] Univ Fed Rio Grande do Sul, Inst Informat, Porto Alegre, RS, Brazil
[2] Vrije Univ Brussel, Artificial Intelligence Lab, Brussels, Belgium
[3] Univ Massachusetts, Amherst, MA 01003 USA
基金
巴西圣保罗研究基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a principled method for performing zero-shot transfer in reinforcement learning (RL) by exploiting approximate models of the environment. Zero-shot transfer in RL has been investigated by leveraging methods rooted in generalized policy improvement (GPI) and successor features (SFs). Although computationally efficient, these methods are model-free: they analyze a library of policies-each solving a particular task-and identify which action the agent should take. We investigate the more general setting where, in addition to a library of policies, the agent has access to an approximate environment model. Even though model-based RL algorithms can identify near-optimal policies, they are typically computationally intensive. We introduce h-GPI, a multi-step extension of GPI that interpolates between these extremes-standard model-free GPI and fully model-based planning-as a function of a parameter, h, regulating the amount of time the agent has to reason. We prove that h-GPI's performance lower bound is strictly better than GPI's, and show that h-GPI generally outperforms GPI as h increases. Furthermore, we prove that as h increases, h-GPI's performance becomes arbitrarily less susceptible to sub-optimality in the agent's policy library. Finally, we introduce novel bounds characterizing the gains achievable by h-GPI as a function of approximation errors in both the agent's policy library and its (possibly learned) model. These bounds strictly generalize those known in the literature. We evaluate h-GPI on challenging tabular and continuous-state problems under value function approximation and show that it consistently outperforms GPI and state-of-the-art competing methods under various levels of approximation errors.
引用
收藏
页数:25
相关论文
共 50 条
  • [41] Industrial multi-step biotransformations
    Panke, S
    Kümmel, A
    Schümperli, M
    Heinemann, M
    CHIMICA OGGI-CHEMISTRY TODAY, 2004, 22 (09) : 44 - 47
  • [42] Multi-step planning in the brain
    Miller, Kevin J.
    Venditto, Sarah Jo C.
    CURRENT OPINION IN BEHAVIORAL SCIENCES, 2021, 38 : 29 - 39
  • [43] A MULTI-STEP SECTOR PHOTOMETER
    VANDENAKKER, JA
    JOURNAL OF THE OPTICAL SOCIETY OF AMERICA, 1946, 36 (10) : 561 - 568
  • [44] Multi-step methods for equations
    Kumar S.
    Sharma J.R.
    Argyros I.K.
    ANNALI DELL'UNIVERSITA' DI FERRARA, 2024, 70 (4) : 1193 - 1215
  • [45] What is a 'step' in a multi-step pathogenesis of leukemia?
    Jankovic, GM
    Colovic, MD
    Bogdanovic, AD
    Vukanic, D
    Andolina, M
    Anagnostopoulos, A
    LEUKEMIA RESEARCH, 1996, 20 (06) : 531 - 532
  • [46] A multi-step on-policy deep reinforcement learning method assisted by off-policy policy evaluation
    Zhang, Huaqing
    Ma, Hongbin
    Mersha, Bemnet Wondimagegnehu
    Jin, Ying
    APPLIED INTELLIGENCE, 2024, 54 (21) : 11144 - 11159
  • [47] Generalized Kung-Traub method and its multi-step iteration in Banach spaces
    Sharma, Janak Raj
    Kumar, Sunil
    Argyros, Ioannis K.
    JOURNAL OF COMPLEXITY, 2019, 54
  • [48] A Multi-Step Wind Speed Prediction Model for Multiple Sites Leveraging Spatio-temporal Correlation
    Chen J.
    Zhu Q.
    Shi D.
    Li Y.
    Zhu L.
    Duan X.
    Liu Y.
    Zhongguo Dianji Gongcheng Xuebao/Proceedings of the Chinese Society of Electrical Engineering, 2019, 39 (07): : 2093 - 2105
  • [49] Multi-Step Usage of in Vivo Models During Rational Drug Design and Discovery
    Williams, Charles H.
    Hong, Charles C.
    INTERNATIONAL JOURNAL OF MOLECULAR SCIENCES, 2011, 12 (04): : 2262 - 2274
  • [50] INFORM : Information eNtropy based multi-step reasoning FOR large language Models
    Zhou, Chuyue
    You, Wangjie
    Li, Juntao
    Ye, Jing
    Chen, Kehai
    Zhang, Min
    2023 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING, EMNLP 2023, 2023, : 3565 - 3576