Visual Reinforcement Learning With Self-Supervised 3D Representations

被引：10

作者：

Ze, Yanjie ^{[1
,2
]}

Hansen, Nicklas ^{[2
]}

Chen, Yinbo ^{[2
]}

Jain, Mohit ^{[2
]}

Wang, Xiaolong ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai 200240, Peoples R China

[2] Univ Calif San Diego, San Diego, CA 92093 USA

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2023年 / 8卷 / 05期

关键词：

Three-dimensional displays; Task analysis; Visualization; Cameras; Representation learning; Training; Robot vision systems; Reinforcement learning; representation learning; deep learning for visual perception;

D O I：

10.1109/LRA.2023.3259681

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

A prominent approach to visual Reinforcement Learning (RL) is to learn an internal state representation using self-supervised methods, which has the potential benefit of improved sample-efficiency and generalization through additional learning signal and inductive biases. However, while the real world is inherently 3D, prior efforts have largely been focused on leveraging 2D computer vision techniques as auxiliary self-supervision. In this work, we present a unified framework for self-supervised learning of 3D representations for motor control. Our proposed framework consists of two phases: a pretraining phase where a deep voxel-based 3D autoencoder is pretrained on a large object-centric dataset, and a finetuning phase where the representation is jointly finetuned together with RL on in-domain data. We empirically show that our method enjoys improved sample efficiency compared to 2D representation learning methods. Additionally, our learned policies transfer zero-shot to a real robot setup with only approximate geometric correspondence, and successfully solve motor control tasks that involve grasping and lifting from a single, uncalibrated RGB camera.

引用

页码：2890 / 2897

页数：8

共 44 条

[31] Nair S, 2022, Arxiv, DOI arXiv:2203.12601
[32] Burgess CP, 2019, Arxiv, DOI arXiv:1901.11390
[33] Parisi S, 2022, PR MACH LEARN RES
[34] Qi H., 2021, PROC INT C LEARN REP, P1
[35] Common Objects in 3D: Large-Scale Learning and Evaluation of Real-life 3D Category Reconstruction
Reizenstein, Jeremy
Shapovalov, Roman
Henzler, Philipp
Sbordone, Luca
Labatut, Patrick
Novotny, David
[J]. 2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, : 10881 - 10891
[36] Silver D, 2017, Arxiv, DOI arXiv:1712.01815
[37] DROID: Minimizing the Reality Gap Using Single-Shot Human Demonstration
Tsai, Ya-Yen
Xu, Hui
Ding, Zihan
Zhang, Chong
Johns, Edward
Huang, Bidan
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 3168 - 3175
[38] Learning Spatial Common Sense with Geometry-Aware Recurrent Networks
Tung, Hsiao-Yu Fish
Cheng, Ricson
Fragkiadaki, Katerina
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 2590 - 2598
[39] LEAST-SQUARES ESTIMATION OF TRANSFORMATION PARAMETERS BETWEEN 2 POINT PATTERNS
UMEYAMA, S
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1991, 13 (04) : 376 - 380
[40] Wang K., 2020, ADV NEURAL INFORM PR, V33

← 1 2 3 4 5 →