Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

被引:0
|
作者
Hoang, Christopher [1 ]
Sohn, Sungryull [1 ,2 ]
Choi, Jongwook [1 ]
Carvalho, Wilka [1 ]
Lee, Honglak [1 ,2 ]
机构
[1] Univ Michigan, Ann Arbor, MI 48109 USA
[2] LG AI Res, Ann Arbor, MI USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Operating in the real-world often requires agents to learn about a complex environment and apply this understanding to achieve a breadth of goals. This problem, known as goal-conditioned reinforcement learning (GCRL), becomes especially challenging for long-horizon goals. Current methods have tackled this problem by augmenting goal-conditioned policies with graph-based planning algorithms. However, they struggle to scale to large, high-dimensional state spaces and assume access to exploration mechanisms for efficiently collecting training data. In this work, we introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments so as to obtain a policy that is proficient for any goal. SFL leverages the ability of successor features (SF) to capture transition dynamics, using it to drive exploration by estimating state-novelty and to enable high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We further exploit SF to directly compute a goal-conditioned policy for inter-landmark traversal, which we use to execute plans to "frontier" landmarks at the edge of the explored state space. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks(1).
引用
收藏
页数:13
相关论文
共 50 条
  • [1] Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors
    Pertsch, Karl
    Rybkin, Oleh
    Ebert, Frederik
    Finn, Chelsea
    Jayaraman, Dinesh
    Levine, Sergey
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Intelligent redundant manipulation for long-horizon operations with multiple goal-conditioned hierarchical learning
    Zhou, Haoran
    Lin, Xiankun
    ADVANCED ROBOTICS, 2025, 39 (06) : 291 - 304
  • [3] Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning
    Lisheng Wu
    Ke Chen
    Machine Learning, 2024, 113 : 2527 - 2557
  • [4] Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning
    Wu, Lisheng
    Chen, Ke
    MACHINE LEARNING, 2024, 113 (05) : 2527 - 2557
  • [5] Policy Learning via Skill-step Abstraction for Long-horizon Goal-Conditioned Tasks
    Kim, Donghoon
    Yoo, Minjong
    Woo, Honguk
    PROCEEDINGS OF THE THIRTY-THIRD INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2024, 2024, : 4282 - 4290
  • [6] Contrastive Learning as Goal-Conditioned Reinforcement Learning
    Eysenbach, Benjamin
    Zhang, Tianjun
    Levine, Sergey
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [7] Goal-Conditioned Reinforcement Learning with Imagined Subgoals
    Chane-Sane, Elliot
    Schmid, Cordelia
    Laptev, Ivan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [8] State Representation Learning for Goal-Conditioned Reinforcement Learning
    Steccanella, Lorenzo
    Jonsson, Anders
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 84 - 99
  • [9] Bisimulation Makes Analogies in Goal-Conditioned Reinforcement Learning
    Hansen-Estruch, Philippe
    Zhang, Amy
    Nair, Ashvin
    Yin, Patrick
    Levine, Sergey
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [10] Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning
    Feng, Xiaoyun
    Jiang, Li
    Yu, Xudong
    Xu, Haoran
    Sun, Xiaoyan
    Wang, Jie
    Zhan, Xianyuan
    Chan, Wai Kin
    IEEE TRANSACTIONS ON GAMES, 2024, 16 (01) : 102 - 112