Robot manipulation skills learning for sparse rewards

被引：1

作者：

Wu P.-L. ^{[1
,2
]}

Zhang Y. ^{[1
,2
]}

Mao B.-Y. ^{[1
,2
]}

Chen W.-B. ^{[3
]}

Gao G.-W. ^{[3
]}

机构：

[1] School of Information Science and Engineering, Yanshan University, Hebei, Qinhuangdao

[2] The Key Laboratory for Computer Virtual Technology, System Integration of Hebei Province, Hebei, Qinhuangdao

[3] School of Automation, Beijing Information Science and Technology University, Beijing

来源：

Kongzhi Lilun Yu Yingyong/Control Theory and Applications | 2024年 / 41卷 / 01期

基金：

中国国家自然科学基金;

关键词：

adaptive temperature parameters; maximum entropy methods; meta-learning; reinforcement learning; robot manipulation skills learning; sparse reward;

D O I：

10.7641/CTA.2022.20121

中图分类号：

学科分类号：

摘要：

Robot manipulation skills learning based on deep reinforcement learning has become a research hotspot. However, due to the sparse reward nature of robot manipulation skills learning, the learning efficiency is low. In this paper, a double experience replay buffer adaptive soft hindsight experience replay (DAS-HER) algorithm based on meta-learning is proposed, and applied to solve the manipulation skills learning problem with sparse reward. Firstly, based on the soft hindsight experience replay (SHER) algorithm, a simplified value function which can improve the efficiency of the algorithm is derived, and a temperature adaptive adjustment strategy is introduced which can dynamically adjust the temperature parameters to adapt to different task environments. Secondly, combined with meta-learning, the experience replay is segmented, dynamically adjust the ratio of real sampling data and construct virtual data during training, and the DAS-HER algorithm is proposed. Thirdly, a generalized framework for robot manipulation skills learning under a sparse reward environment is constructed, and DAS-HER algorithm is applied to robot manipulation skills learning. Finally, comparative experiments for eight tasks are conducted both in Fetch and Hand environments under Mujoco environment, and the results show that the proposed algorithms outperform other algorithms in terms of training efficiency and success rate. © 2024 South China University of Technology. All rights reserved.

引用

页码：99 / 108

页数：9

共 18 条

[1]

PETERS J, SCHAAL S., Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp. 682-697, (2008)

[2]

CHEBOTAR Y, KALAKRISHNAN M, YAHYA A, Et al., Path integral guided policy search, Proceedings of IEEE International Conference on Robotics and Automation, pp. 3381-3388, (2017)

[3]

ANDRYCHOWICZ M, BAKER B, CHOCIEJ M, Et al., Learning dexterous in-hand manipulation, The International Journal of Robotics Research, 39, 1, pp. 3-20, (2020)

[4]

LI H, KUMAR N, CHEN R, Et al., A deep reinforcement learning framework for identifying funny scenes in movies, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3116-3120, (2018)

[5]

YANG Rui, YAN Jiangpeng, LI Xiu, Research on reinforcement learning sparse reward algorithm, CAAI Transactions on Intelligent Systems, 15, 5, pp. 888-899, (2020)

[6]

GULLAPALLI V, BARTO A G., Shaping as a method for accelerating reinforcement learning, Proceedings of IEEE International Symposium on Intelligent Control, pp. 554-559, (1992)

[7]

HUSSEIN A, GABER M M, ELYAN E, Et al., Imitation learning: A survey of learning methods, ACM Computing Surveys, 50, 2, pp. 1-35, (2017)

[8]

BENGIO Y, LOURADOUR J, COLLOBERT R, Et al., Curriculum learning, Proceedings of International Conference on Machine Learning, pp. 41-48, (2009)

[9]

ANDRYCHOWICZ M, WOLSKI F, RAY A, Et al., Hindsight experience replay, Advances in Neural Information Processing Systems, 12, 3, pp. 5048-5058, (2017)

[10]

REIZINGER P, SZEMENYEI M., Attention-based curiosity-driven exploration in deep reinforcement learning, Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 3542-3546, (2020)

← 1 2 →