Pneumatic artificial muscle-driven robot control using local update reinforcement learning

被引:24
作者
Cui, Yunduan [1 ]
Matsubara, Takamitsu [1 ]
Sugimoto, Kenji [1 ]
机构
[1] Nara Inst Sci & Technol, Grad Sch Informat Sci, Nara, Japan
关键词
Smooth policy update; dynamic policy programming; robot motor learning; SEARCH;
D O I
10.1080/01691864.2016.1274680
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
In this study, a new value function based Reinforcement learning (RL) algorithm, Local Update Dynamic Policy Programming (LUDPP), is proposed. It exploits the nature of smooth policy update using Kullback-Leibler divergence to update its value function locally and considerably reduces the computational complexity. We firstly investigated the learning performance of LUDPP and other algorithms without smooth policy update for tasks of pendulum swing up and n DOFs manipulator reaching in simulation. Only LUDPP could efficiently and stably learn good control policies in high dimensional systems with limited number of training samples. In real word application, we applied LUDPP to control Pneumatic Artificial Muscles (PAMs) driven robots without the knowledge of model which is challenging for traditional methods due to the high nonlinearities of PAM's air pressure dynamics and mechanical structure. LUDPP successfully achieved one finger control of Shadow Dexterous Hand, a PAM-driven humanoid robot hand, with far lower computational resource compared with other conventional value function based RL algorithms.
引用
收藏
页码:397 / 412
页数:16
相关论文
共 34 条
  • [21] Least-squares policy iteration
    Lagoudakis, MG
    Parr, R
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) : 1107 - 1149
  • [22] Li L., 2009, Proceedings of The 8th International Conference on Autonomous Agents and Multiagent Systems, V2, P733
  • [23] Low-Speed Longitudinal Controllers for Mass-Produced Cars: A Comparative Study
    Milanes, Vicente
    Villagra, Jorge
    Perez, Joshue
    Gonzalez, Carlos
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2012, 59 (01) : 620 - 628
  • [24] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    [J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [25] Peters J, 2010, AAAI CONF ARTIF INTE, P1607
  • [26] Sawicki GS, 2005, INT C REHAB ROBOT, P206
  • [27] Dynamic movement primitives - A framework for motor control in humans and humanoid robotics
    Schaal, S
    [J]. ADAPTIVE MOTION OF ANIMALS AND MACHINES, 2006, : 261 - 280
  • [28] Learining Control in Robotics Trajectory-Based Optimal Control Techniques
    Schaal, Stefan
    Atkeson, Christopher G.
    [J]. IEEE ROBOTICS & AUTOMATION MAGAZINE, 2010, 17 (02) : 20 - 29
  • [29] Shen Y., 2006, P 19 ANN C NEUR INF, P1225
  • [30] Sutton R., 1998, Introduction to reinforcement learning