Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

被引:0
作者
Chebotar, Yevgen [1 ]
Vuong, Quan [1 ]
Irpan, Alex [1 ]
Hausman, Karol [1 ]
Xia, Fei [1 ]
Lu, Yao [1 ]
Kumar, Aviral [1 ]
Yu, Tianhe [1 ]
Herzog, Alexander [1 ]
Pertsch, Karl [1 ]
Gopalakrishnan, Keerthana [1 ]
Ibarz, Julian [1 ]
Nachum, Ofir [1 ]
Sontakke, Sumedh [1 ]
Salazar, Grecia [1 ]
Tran, Huong T. [1 ]
Peralta, Jodilyn [1 ]
Tan, Clayton [1 ]
Manjunath, Deeksha [1 ]
Singht, Jaspiar [1 ]
Zitkovich, Brianna [1 ]
Jackson, Tomas [1 ]
Rao, Kanishka [1 ]
Finn, Chelsea [1 ]
Levine, Sergey [1 ]
机构
[1] Google DeepMind, London, England
来源
CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at qtransformer.github.io
引用
收藏
页数:20
相关论文
共 77 条
[1]  
Ahn Michael, 2022, arXiv
[2]  
Anderson P., 2018, CoRR abs/1807.06757
[3]  
Anderson Peter, 2017, arXiv
[4]  
Brandfonbrener David, 2022, ARXIV
[5]  
Brohan A., 2022, arXiv
[6]  
Cabi S., 2019, arXiv
[7]  
Cer D., 2018, ARXIV
[8]  
Chen LL, 2021, ADV NEUR IN, V34
[9]  
Chen X., 2019, ARXIV
[10]  
Chowdhery A., 2022, arXiv