Reinforcement learning with model-based feedforward inputs for robotic table tennis

被引:0
作者
Hao Ma
Dieter Büchler
Bernhard Schölkopf
Michael Muehlebach
机构
[1] Max Planck Institute for Intelligent Systems,Learning and Dynamical Systems
[2] Max Planck Institute for Intelligent Systems,Empirical Inference
来源
Autonomous Robots | 2023年 / 47卷
关键词
Reinforcement learning; Iterative learning control; Supervised learning; Table tennis robot; Pneumatic artificial muscle; Soft robotics;
D O I
暂无
中图分类号
学科分类号
摘要
We rethink the traditional reinforcement learning approach, which is based on optimizing over feedback policies, and propose a new framework that optimizes over feedforward inputs instead. This not only mitigates the risk of destabilizing the system during training but also reduces the bulk of the learning to a supervised learning task. As a result, efficient and well-understood supervised learning techniques can be applied and are tuned using a validation data set. The labels are generated with a variant of iterative learning control, which also includes prior knowledge about the underlying dynamics. Our framework is applied for intercepting and returning ping-pong balls that are played to a four-degrees-of-freedom robotic arm in real-world experiments. The robot arm is driven by pneumatic artificial muscles, which makes the control and learning tasks challenging. We highlight the potential of our framework by comparing it to a reinforcement learning approach that optimizes over feedback policies. We find that our framework achieves a higher success rate for the returns (100%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$100\%$$\end{document} vs. 96%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$96\%$$\end{document}, on 107 consecutive trials, see https://youtu.be/kR9jowEH7PY) while requiring only about one tenth of the samples during training. We also find that our approach is able to deal with a variant of different incoming trajectories.
引用
收藏
页码:1387 / 1403
页数:16
相关论文
共 70 条
  • [1] Ahn HS(2007)Iterative learning control: Brief survey and categorization IEEE Transactions on Systems, Man, and Cybernetics, Part C 37 1099-1121
  • [2] Chen Y(2016)An integrated intelligent nonlinear control method for a pneumatic artificial muscle IEEE/ASME Transactions on Mechatronics 21 1835-1845
  • [3] Moore KL(2021)The o80 c++ templated toolbox: Designing customized python APIs for synchronizing realtime processes Journal of Open Source Software 6 2752-2755
  • [4] Ba DX(2006)A survey of iterative learning control IEEE Control Systems Magazine 26 96-114
  • [5] Dinh TQ(2023)Learning to control highly accelerated ballistic movements on muscular robots Robotics and Autonomous Systems 159 104230-104241
  • [6] Ahn KK(1978)Guaranteed margins for LQG regulators IEEE Transactions on Automatic Control 23 756-757
  • [7] Berenz V(2016)Supervised fuzzy reinforcement learning for robot navigation Applied Soft Computing 40 33-41
  • [8] Naveau M(2012)Control of pneumatic artificial muscle system through experimental modelling Mechatronics 22 1135-1147
  • [9] Widmaier F(2015)A comprehensive survey on safe reinforcement learning Journal of Machine Learning Research 16 1437-1480
  • [10] Bristow D(2019)Reliable real-time ball tracking for robot table tennis Robotics 8 90-102