Pneumatic artificial muscle-driven robot control using local update reinforcement learning

被引：24

作者：

Cui, Yunduan ^{[1
]}

Matsubara, Takamitsu ^{[1
]}

Sugimoto, Kenji ^{[1
]}

机构：

[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan

来源：

ADVANCED ROBOTICS | 2017年 / 31卷 / 08期

关键词：

Smooth policy update; dynamic policy programming; robot motor learning; SEARCH;

D O I：

10.1080/01691864.2016.1274680

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

In this study, a new value function based Reinforcement learning (RL) algorithm, Local Update Dynamic Policy Programming (LUDPP), is proposed. It exploits the nature of smooth policy update using Kullback-Leibler divergence to update its value function locally and considerably reduces the computational complexity. We firstly investigated the learning performance of LUDPP and other algorithms without smooth policy update for tasks of pendulum swing up and n DOFs manipulator reaching in simulation. Only LUDPP could efficiently and stably learn good control policies in high dimensional systems with limited number of training samples. In real word application, we applied LUDPP to control Pneumatic Artificial Muscles (PAMs) driven robots without the knowledge of model which is challenging for traditional methods due to the high nonlinearities of PAM's air pressure dynamics and mechanical structure. LUDPP successfully achieved one finger control of Shadow Dexterous Hand, a PAM-driven humanoid robot hand, with far lower computational resource compared with other conventional value function based RL algorithms.

引用

页码：397 / 412

页数：16

共 34 条

[21] Least-squares policy iteration
Lagoudakis, MG
Parr, R
[J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
[22] Li L., 2009, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, V2, P733
[23] Low-Speed Longitudinal Controllers for Mass-Produced Cars: A Comparative Study
Milanes, Vicente
Villagra, Jorge
Perez, Joshue
Gonzalez, Carlos
[J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2012, 59 (01) : 620 - 628
[24] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
[J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[25] Peters J, 2010, AAAI CONF ARTIF INTE, P1607
[26] Sawicki GS, 2005, INT C REHAB ROBOT, P206
[27] Dynamic movement primitives - A framework for motor control in humans and humanoid robotics
Schaal, S
[J]. ADAPTIVE MOTION OF ANIMALS AND MACHINES, 2006, : 261 - 280
[28] Learining Control in Robotics Trajectory-Based Optimal Control Techniques
Schaal, Stefan
Atkeson, Christopher G.
[J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2010, 17 (02) : 20 - 29
[29] Shen Y., 2006, P 19 ANN C NEUR INF, P1225
[30] Sutton R., 1998, Introduction to reinforcement learning

← 1 2 3 4 →