Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

被引：0

作者：

Chebotar, Yevgen ^{[1
]}

Vuong, Quan ^{[1
]}

Irpan, Alex ^{[1
]}

Hausman, Karol ^{[1
]}

Xia, Fei ^{[1
]}

Lu, Yao ^{[1
]}

Kumar, Aviral ^{[1
]}

Yu, Tianhe ^{[1
]}

Herzog, Alexander ^{[1
]}

Pertsch, Karl ^{[1
]}

Gopalakrishnan, Keerthana ^{[1
]}

Ibarz, Julian ^{[1
]}

Nachum, Ofir ^{[1
]}

Sontakke, Sumedh ^{[1
]}

Salazar, Grecia ^{[1
]}

Tran, Huong T. ^{[1
]}

Peralta, Jodilyn ^{[1
]}

Tan, Clayton ^{[1
]}

Manjunath, Deeksha ^{[1
]}

Singht, Jaspiar ^{[1
]}

Zitkovich, Brianna ^{[1
]}

Jackson, Tomas ^{[1
]}

Rao, Kanishka ^{[1
]}

Finn, Chelsea ^{[1
]}

Levine, Sergey ^{[1
]}

机构：

[1] Google DeepMind, London, England

来源：

CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer. By discretizing each action dimension and representing the Q-value of each action dimension as separate tokens, we can apply effective high-capacity sequence modeling techniques for Q-learning. We present several design decisions that enable good performance with offline RL training, and show that Q-Transformer outperforms prior offline RL algorithms and imitation learning techniques on a large diverse real-world robotic manipulation task suite. The project's website and videos can be found at qtransformer.github.io

引用

页数：20

共 77 条

[1]

Ahn Michael, 2022, arXiv

[2]

Anderson P., 2018, CoRR abs/1807.06757

[3]

Anderson Peter, 2017, arXiv

[4]

Brandfonbrener David, 2022, ARXIV

[5]

Brohan A., 2022, arXiv

[6]

Cabi S., 2019, arXiv

[7]

Cer D., 2018, ARXIV

[8]

Chen LL, 2021, ADV NEUR IN, V34

[9]

Chen X., 2019, ARXIV

[10]

Chowdhery A., 2022, arXiv

← 1 2 3 4 5 6 7 8 →