HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents

被引：0

作者：

Horvath, Daniel ^{[1
,2
,3
]}

Martin, Jesus Bujalance ^{[1
]}

Erdos, Ferenc Gabor ^{[2
]}

Istenes, Zoltan ^{[3
]}

Moutarde, Fabien ^{[1
]}

机构：

[1] PSL Univ, Ctr Robot, Mines Paris, F-75272 Paris, France

[2] Hungarian Res Network, Inst Comp Sci & Control, Ctr Excellence Prod Informat & Control, H-1111 Budapest, Hungary

[3] Eotvos Lorand Univ, CoLocat Ctr Acad & Ind Cooperat, H-1117 Budapest, Hungary

来源：

IEEE ACCESS | 2024年 / 12卷

关键词：

Task analysis; Data collection; Training; Robots; Standards; Random variables; Process control; Curriculum development; Curriculum learning; experience replay; reinforcement learning; robotics;

D O I：

10.1109/ACCESS.2024.3427012

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Even though reinforcement-learning-based algorithms achieved superhuman performance in many domains, the field of robotics poses significant challenges as the state and action spaces are continuous, and the reward function is predominantly sparse. Furthermore, on many occasions, the agent is devoid of access to any form of demonstration. Inspired by human learning, in this work, we propose a method named highlight experience replay (HiER) that creates a secondary highlight replay buffer for the most relevant experiences. For the weights update, the transitions are sampled from both the standard and the highlight experience replay buffer. It can be applied with or without the techniques of hindsight experience replay (HER) and prioritized experience replay (PER). Our method significantly improves the performance of the state-of-the-art, validated on 8 tasks of three robotic benchmarks. Furthermore, to exploit the full potential of HiER, we propose HiER+ in which HiER is enhanced with an arbitrary data collection curriculum learning method. Our implementation, the qualitative results, and a video presentation are available on the project site: http://www.danielhorvath.eu/hier/.

引用

页码：100102 / 100119

页数：18

共 54 条

[1] Agarwal R., 2021, P ADV NEUR INF PROC, V34, P29320
[2] Agarwal R., 2020, INT C MACHINE LEARN, P114
[3] Andrychowicz M., 2017, P 2017 31 INT C NEUR, VVolume 30, DOI DOI 10.48550/ARXIV.1707.01495
[4] Badia AP, 2020, PR MACH LEARN RES, V119
[5] Bao YJ, 2019, IEEE IMAGE PROC, P2309, DOI [10.1109/ICIP.2019.8803726, 10.1109/icip.2019.8803726]
[6] Sim2Air-Synthetic Aerial Dataset for UAV Monitoring
Barisic, Antonella
Petric, Frano
Bogdan, Stjepan
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) : 3757 - 3764
[7] The Arcade Learning Environment: An Evaluation Platform for General Agents
Bellemare, Marc G.
Naddaf, Yavar
Veness, Joel
Bowling, Michael
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2013, 47 : 253 - 279
[8] Bengio Y., 2009, P 26 ANN INT C MACH, P41, DOI 10.1145/1553374.1553380
[9] Chan Stephanie C. Y., 2020, Measuring the Reliability of Reinforcement Learning Algorithms
[10] Colas C, 2018, Arxiv, DOI arXiv:1806.08295

← 1 2 3 4 5 6 →