A policy gradient algorithm integrating long and short-term rewards for soft continuum arm control

被引：6

作者：

Dong Xiang ^{[1
]}

Zhang Jing ^{[1
]}

Cheng Long ^{[2
]}

Xu WenJun ^{[3
]}

Su Hang ^{[3
]}

Mei Tao ^{[3
]}

机构：

[1] Anhui Univ, Sch Elect Engn & Automat, Hefei 230601, Peoples R China

[2] Chinese Acad Sci, Inst Automat, State Key Lab Control & Management Complex Syst, Beijing 100190, Peoples R China

[3] Peng Cheng Lab, Robot Res Ctr, Shenzhen 518055, Peoples R China

来源：

SCIENCE CHINA-TECHNOLOGICAL SCIENCES | 2022年 / 65卷 / 10期

关键词：

soft arm control; Cosserat rod; deep reinforcement learning; policy gradient algorithm; high sample complexity;

D O I：

10.1007/s11431-022-2063-8

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The soft continuum arm has extensive application in industrial production and human life due to its superior safety and flexibility. Reinforcement learning is a powerful technique for solving soft arm continuous control problems, which can learn an effective control policy with an unknown system model. However, it is often affected by the high sample complexity and requires huge amounts of data to train, which limits its effectiveness in soft arm control. An improved policy gradient method, policy gradient integrating long and short-term rewards denoted as PGLS, is proposed in this paper to overcome this issue. The short-term rewards provide more dynamic-aware exploration directions for policy learning and improve the exploration efficiency of the algorithm. PGLS can be integrated into current policy gradient algorithms, such as deep deterministic policy gradient (DDPG). The overall control framework is realized and demonstrated in a dynamics simulation environment. Simulation results show that this approach can effectively control the soft arm to reach and track the targets. Compared with DDPG and other model-free reinforcement learning algorithms, the proposed PGLS algorithm has a great improvement in convergence speed and performance. In addition, a fluid-driven soft manipulator is designed and fabricated in this paper, which can verify the proposed PGLS algorithm in real experiments in the future.

引用

页码：2409 / 2419

页数：11

共 35 条

[1]

Bellemare MG, 2016, ADV NEUR IN, V29

[2]

Boedecker, 2017, C ROBOT LEARNING, P195

[3] Software toolkit for modeling, simulation, and control of soft robots [J].

Coevoet, E. ;

Morales-Bieze, T. ;

Largilliere, F. ;

Zhang, Z. ;

Thieffry, M. ;

Sanz-Lopez, M. ;

Carrez, B. ;

Marchal, D. ;

Goury, O. ;

Dequidt, J. ;

Duriez, C. .

ADVANCED ROBOTICS, 2017, 31 (22) :1208-1224

[4]

Engel Yaakov., 2005, Advances in neural information processing systems, P347

[5]

Feinberg V, ARXIV 180300101

[6]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[7] Forward and inverse problems in the mechanics of soft filaments [J].

Gazzola, M. ;

Dudte, L. H. ;

McCormick, A. G. ;

Mahadevan, L. .

ROYAL SOCIETY OPEN SCIENCE, 2018, 5 (06)

[8] Fast, Generic, and Reliable Control and Simulation of Soft Robots Using Model Order Reduction [J].

Goury, Olivier ;

Duriez, Christian .

IEEE TRANSACTIONS ON ROBOTICS, 2018, 34 (06) :1565-1576

[9]

[管清华 Guan Qinghua], 2020, [中国科学. 技术科学, Scientia Sinica Technologica], V50, P897

[10] Dynamic modeling and control of an octopus inspired multiple continuum arm robot [J].

Kang, Rongjie ;

Branson, David T. ;

Guglielmino, Emanuele ;

Caldwell, Darwin G. .

COMPUTERS & MATHEMATICS WITH APPLICATIONS, 2012, 64 (05) :1004-1016

← 1 2 3 4 →