Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

被引:0
|
作者
Li, Yunfei [1 ]
Gao, Tian [1 ]
Yang, Jiaqi [2 ]
Xu, Huazhe [3 ]
Wu, Yi [1 ,4 ]
机构
[1] Tsinghua Univ, Inst Interdisciplinary Informat Sci, Beijing, Peoples R China
[2] Univ Calif Berkeley, Dept Elect Engn & Comp Sci, Berkeley, CA USA
[3] Stanford Univ, Stanford, CA USA
[4] Shanghai Qi Zhi Inst, Shanghai, Peoples R China
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It has been a recent trend to leverage the power of supervised learning (SL) towards more effective reinforcement learning (RL) methods. We propose a novel phasic approach by alternating online RL and offline SL for tackling sparse-reward goal-conditioned problems. In the online phase, we perform RL training and collect rollout data while in the offline phase, we perform SL on those successful trajectories from the dataset. To further improve sample efficiency, we adopt additional techniques in the online phase including task reduction to generate more feasible trajectories and a value- difference-based intrinsic reward to alleviate the sparse-reward issue. We call this overall algorithm, PhAsic self-Imitative Reduction (PAIR). PAIR substantially outperforms both non-phasic RL and phasic SL baselines on sparse-reward goal-conditioned robotic control problems, including a challenging stacking task. PAIR is the first RL method that learns to stack 6 cubes with only 0/1 success rewards from scratch.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning
    Lisheng Wu
    Ke Chen
    Machine Learning, 2024, 113 : 2527 - 2557
  • [2] Goal exploration augmentation via pre-trained skills for sparse-reward long-horizon goal-conditioned reinforcement learning
    Wu, Lisheng
    Chen, Ke
    MACHINE LEARNING, 2024, 113 (05) : 2527 - 2557
  • [3] Robotic Control in Adversarial and Sparse Reward Environments: A Robust Goal-Conditioned Reinforcement Learning Approach
    He X.
    Lv C.
    IEEE Transactions on Artificial Intelligence, 2024, 5 (01): : 244 - 253
  • [4] Contrastive Learning as Goal-Conditioned Reinforcement Learning
    Eysenbach, Benjamin
    Zhang, Tianjun
    Levine, Sergey
    Salakhutdinov, Ruslan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [5] Self-imitation guided goal-conditioned reinforcement learning
    Li, Yao
    Wang, Yuhui
    Tan, Xiaoyang
    PATTERN RECOGNITION, 2023, 144
  • [6] Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
    Hongyu Ding
    Yuanze Tang
    Qing Wu
    Bo Wang
    Chunlin Chen
    Zhi Wang
    IEEE/CAAJournalofAutomaticaSinica, 2023, 10 (12) : 2233 - 2247
  • [7] Magnetic Field-Based Reward Shaping for Goal-Conditioned Reinforcement Learning
    Ding, Hongyu
    Tang, Yuanze
    Wu, Qing
    Wang, Bo
    Chen, Chunlin
    Wang, Zhi
    IEEE-CAA JOURNAL OF AUTOMATICA SINICA, 2023, 10 (12) : 2233 - 2247
  • [8] Goal-Conditioned Reinforcement Learning with Imagined Subgoals
    Chane-Sane, Elliot
    Schmid, Cordelia
    Laptev, Ivan
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [9] State Representation Learning for Goal-Conditioned Reinforcement Learning
    Steccanella, Lorenzo
    Jonsson, Anders
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 84 - 99
  • [10] Learning Goal-Conditioned Policies Offline with Self-Supervised Reward Shaping
    Mezghani, Lina
    Sukhbaatar, Sainbayar
    Bojanowski, Piotr
    Lazaric, Alessandro
    Alahari, Karteek
    CONFERENCE ON ROBOT LEARNING, VOL 205, 2022, 205 : 1401 - 1410