Learning future representation with synthetic observations for sample-efficient reinforcement learning

被引:0
|
作者
Xin LIU [1 ,2 ]
Yaran CHEN [1 ,2 ]
Haoran LI [1 ,2 ]
Dongbin ZHAO [1 ,2 ]
机构
[1] State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation,Chinese Academy of Sciences
[2] School of Artificial Intelligence, University of Chinese Academy of
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论]; TP391.41 [];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ; 080203 ;
摘要
Image-based reinforcement learning(RL) has proven its effectiveness for continuous visual control of embodied agents, where upstream representation learning largely determines the effectiveness of policy learning. Employing selfsupervised auxiliary tasks allows the agent to enhance visual representation in a targeted manner, thereby improving the policy performance and the RL sample efficiency. Prior advanced self-supervised RL methods all try to design better auxiliary objectives to extract more information from agent experience, while ignoring the training data constraints caused by experience limitations in RL training. In this article, we first try to break through this auxiliary training data constraint,proposing a novel RL auxiliary task named learning future representation with synthetic observations(LFS), which improves the self-supervised RL by enriching auxiliary training data. Firstly, a novel training-free method, named frame mask, is proposed to synthesize novel observations that may contain future information. Next, the latent nearest-neighbor clip(LNC)is correspondingly proposed to alleviate the impact of unqualified noise in synthetic observations. The remaining synthetic observations and real observations then together serve as the auxiliary training data to achieve a clustering-based temporal association task for advanced representation learning. LFS allows the agent to access and learn observations that are not present in the current experience but will appear in future training, thus enabling comprehensive visual understanding and an efficient RL process. In addition, LFS does not rely on rewards or actions, which means it has a wider scope of application(e.g., learning from video) than recent advanced RL auxiliary tasks. We conduct extensive experiments on challenging continuous visual control of complex embodied agents, including robot locomotion and manipulation. The results demonstrate that our LFS exhibits state-of-the-art sample efficiency on end-to-end RL tasks(leading on 12/13 tasks), and enables advanced RL visual pre-training(outperforming the next best method by 1.51×) on action-free video demonstrations.
引用
收藏
页码:20 / 37
页数:18
相关论文
共 50 条
  • [1] Learning future representation with synthetic observations for sample-efficient reinforcement learning
    Xin Liu
    Yaran Chen
    Haoran Li
    Dongbin Zhao
    Science China Information Sciences, 2025, 68 (5)
  • [2] Sample-efficient Reinforcement Learning Representation Learning with Curiosity Contrastive Forward Dynamics Model
    Nguyen, Thanh
    Luu, Tung M.
    Vu, Thang
    Yoo, Chang D.
    2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, : 3471 - 3477
  • [3] Sample-Efficient Reinforcement Learning of Undercomplete POMDPs
    Jin, Chi
    Kakade, Sham M.
    Krishnamurthy, Akshay
    Liu, Qinghua
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [4] Sample-efficient Reinforcement Learning in Robotic Table Tennis
    Tebbe, Jonas
    Krauch, Lukas
    Gao, Yapeng
    Zell, Andreas
    2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 4171 - 4178
  • [5] Sample-Efficient Reinforcement Learning in the Presence of Exogenous Information
    Efroni, Yonathan
    Foster, Dylan J.
    Misra, Dipendra
    Krishnamurthy, Akshay
    Langford, John
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
  • [6] Sample-efficient reinforcement learning for CERN accelerator control
    Kain, Verena
    Hirlander, Simon
    Goddard, Brennan
    Velotti, Francesco Maria
    Porta, Giovanni Zevi Della
    Bruchon, Niky
    Valentino, Gianluca
    PHYSICAL REVIEW ACCELERATORS AND BEAMS, 2020, 23 (12)
  • [7] A New Sample-Efficient PAC Reinforcement Learning Algorithm
    Zehfroosh, Ashkan
    Tanner, Herbert G.
    2020 28TH MEDITERRANEAN CONFERENCE ON CONTROL AND AUTOMATION (MED), 2020, : 788 - 793
  • [8] Conditional Abstraction Trees for Sample-Efficient Reinforcement Learning
    Dadvar, Mehdi
    Nayyar, Rashmeet Kaur
    Srivastava, Siddharth
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2023, 216 : 485 - 495
  • [9] Sample-Efficient Goal-Conditioned Reinforcement Learning via Predictive Information Bottleneck for Goal Representation Learning
    Zou, Qiming
    Suzuki, Einoshin
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 9523 - 9529
  • [10] Sample-Efficient Reinforcement Learning for Pose Regulation of a Mobile Robot
    Brescia, Walter
    De Cicco, Luca
    Mascolo, Saverio
    2022 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2022, : 42 - 47