Efficient Multi-Goal Reinforcement Learning via Value Consistency Prioritization

被引：0

作者：

Xu J. ^{[1
]}

Li S. ^{[1
]}

Yang R. ^{[2
]}

Yuan C. ^{[1
]}

Han L. ^{[3
]}

机构：

[1] Tsinghua Shenzhen International Graduate School, Guangdong, Shenzhen

[2] The Hong Kong University of Science and Technology, Hong Kong

[3] Tencent Robotics X Shenzhen, Guangdong

来源：

Journal of Artificial Intelligence Research | 2023年 / 77卷

关键词：

All Open Access; Gold;

D O I：

10.1613/jair.1.14398

中图分类号：

学科分类号：

摘要：

Goal-conditioned reinforcement learning (RL) with sparse rewards remains a challenging problem in deep RL. Hindsight Experience Replay (HER) has been demonstrated to be an effective solution, where HER replaces desired goals in failed experiences with practically achieved states. Existing approaches mainly focus on either exploration or exploitation to improve the performance of HER. From a joint perspective, exploiting specific past experiences can also implicitly drive exploration. Therefore, we concentrate on prioritizing both original and relabeled samples for efficient goal-conditioned RL. To achieve this, we propose a novel value consistency prioritization (VCP) method, where the priority of samples is determined by the consistency of ensemble Q-values. This distinguishes the VCP method with most existing prioritization approaches which prioritizes samples based on the uncertainty of ensemble Q-values. Through extensive experiments, we demonstrate that VCP achieves significantly higher sample efficiency than existing algorithms on a range of challenging goal-conditioned manipulation tasks. We also visualize how VCP prioritizes good experiences to enhance policy learning. © 2023 AI Access Foundation. All rights reserved.

引用

页码：355 / 376

页数：21