Visual Reinforcement Learning With Self-Supervised 3D Representations

被引:10
|
作者
Ze, Yanjie [1 ,2 ]
Hansen, Nicklas [2 ]
Chen, Yinbo [2 ]
Jain, Mohit [2 ]
Wang, Xiaolong [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China
[2] Univ Calif San Diego, San Diego, CA 92093 USA
关键词
Three-dimensional displays; Task analysis; Visualization; Cameras; Representation learning; Training; Robot vision systems; Reinforcement learning; representation learning; deep learning for visual perception;
D O I
10.1109/LRA.2023.3259681
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve grasping and lifting from a single, uncalibrated RGB camera.
引用
收藏
页码:2890 / 2897
页数:8
相关论文
共 50 条
  • [21] Audio-guided self-supervised learning for disentangled visual speech representations
    Feng, Dalu
    Yang, Shuang
    Shan, Shiguang
    Chen, Xilin
    FRONTIERS OF COMPUTER SCIENCE, 2024, 18 (06)
  • [22] Self-Supervised Learning of Smart Contract Representations
    Yang, Shouliang
    Gu, Xiaodong
    Shen, Beijun
    30TH IEEE/ACM INTERNATIONAL CONFERENCE ON PROGRAM COMPREHENSION (ICPC 2022), 2022, : 82 - 93
  • [23] Audio-guided self-supervised learning for disentangled visual speech representations
    FENG Dalu
    YANG Shuang
    SHAN Shiguang
    CHEN Xilin
    Frontiers of Computer Science, 2024, 18 (06)
  • [24] Intrinsically Motivated Self-supervised Learning in Reinforcement Learning
    Zhao, Yue
    Du, Chenzhuang
    Zhao, Hang
    Li, Tiejun
    2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2022), 2022, : 3605 - 3615
  • [25] Self-Supervised Reinforcement Learning for Recommender Systems
    Xin, Xin
    Karatzoglou, Alexandros
    Arapakis, Ioannis
    Jose, Joemon M.
    PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20), 2020, : 931 - 940
  • [26] Self-Supervised Learning on 3D Point Clouds by Learning Discrete Generative Models
    Eckart, Benjamin
    Yuan, Wentao
    Liu, Chao
    Kautz, Jan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 8244 - 8253
  • [27] Explaining Self-Supervised Image Representations with Visual Probing
    Basaj, Dominika
    Oleszkiewicz, Witold
    Sieradzki, Igor
    Gorszczak, Michal
    Rychalska, Barbara
    Trzcinski, Tomasz
    Zielinski, Bartosz
    PROCEEDINGS OF THE THIRTIETH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2021, 2021, : 592 - 598
  • [28] Spatial-temporal 3D dependency matching with self-supervised deep learning for monocular visual sensing
    Song, Chengqun
    Niu, Maolong
    Liu, Zhaopeng
    Cheng, Jun
    Wang, Peng
    Li, Hongjian
    Hao, Luoying
    NEUROCOMPUTING, 2022, 481 : 11 - 21
  • [29] DeepVIO: Self-supervised Deep Learning of Monocular Visual Inertial Odometry using 3D Geometric Constraints
    Han, Liming
    Lin, Yimin
    Du, Guoguang
    Lian, Shiguo
    2019 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2019, : 6906 - 6913
  • [30] Curriculum Self-Supervised Learning for 3D CT Cardiac Image Segmentation
    Taher, Mohammad Reza Hosseinzadeh
    Ikuta, Masaki
    Soni, Ravi
    MACHINE LEARNING FOR HEALTH, ML4H, VOL 225, 2023, 225 : 145 - 156