A deep reinforcement learning algorithm to control a two-wheeled scooter with a humanoid robot

被引:7
作者
Baltes, Jacky [1 ]
Christmann, Guilherme [1 ]
Saeedvand, Saeed [1 ]
机构
[1] Natl Taiwan Normal Univ, Dept Elect Engn, Taipei, Taiwan
关键词
Deep reinforcement learning; Proximal policy optimization (PPO); Two-wheeled vehicles; PID control; Humanoid robotics;
D O I
10.1016/j.engappai.2023.106941
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Balancing a two-wheeled scooter is considered a challenging task for robots, as it is a non-linear control problem in a highly dynamic environment. The rapid pace of development of deep reinforcement learning has enabled robots to perform complex control tasks. In this paper, a deep reinforcement learning algorithm is proposed to learn the steering control of the scooter for balancing and patch tracking using an unmodified humanoid robot. Two control strategies are developed, analyzed, and compared: a classical Proportional- Integral-Derivative (PID) controller and a Deep Reinforcement Learning (DRL) controller based on Proximal Policy Optimization (PPO) algorithm. The ability of the robot to balance the scooter using both approaches is extensively evaluated. Challenging control scenarios are tested at low scooter speeds, including 2.5, 5, and 10 km/h. Steering velocities are also varied, including 10, 20, and 40 rad/s. The evaluations include upright balance without disturbances, upright balance under disturbances, tracking sinusoidal path, and path tracking. A 3D model of the humanoid robot and scooter system is developed, which is simulated in a state-of-the-art GPU-based simulation environment as a training and test bed (NVidia's Isaac Gym). Despite the fact that the PID controller successfully balances the robot, better final results are achieved with the proposed DRL. The results indicate a 52% improvement on average in different speeds with better performance in path tracking control. Controller command evaluation on the real robot and scooter indicates the robot's complete capability to realize steering control velocities.
引用
收藏
页数:20
相关论文
共 45 条
  • [1] Advanced metaheuristic optimization techniques in applications of deep neural networks: a review
    Abd Elaziz, Mohamed
    Dahou, Abdelghani
    Abualigah, Laith
    Yu, Liyang
    Alshinwan, Mohammad
    Khasawneh, Ahmad M.
    Lu, Songfeng
    [J]. NEURAL COMPUTING & APPLICATIONS, 2021, 33 (21) : 14079 - 14099
  • [2] Aquila Optimizer: A novel meta-heuristic optimization algorithm
    Abualigah, Laith
    Yousri, Dalia
    Abd Elaziz, Mohamed
    Ewees, Ahmed A.
    Al-qaness, Mohammed A. A.
    Gandomi, Amir H.
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 157 (157)
  • [3] The Arithmetic Optimization Algorithm
    Abualigah, Laith
    Diabat, Ali
    Mirjalili, Seyedali
    Elaziz, Mohamed Abd
    Gandomi, Amir H.
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2021, 376
  • [4] Gazelle optimization algorithm: a novel nature-inspired metaheuristic optimizer
    Agushaka, Jeffrey O.
    Ezugwu, Absalom E.
    Abualigah, Laith
    [J]. NEURAL COMPUTING & APPLICATIONS, 2023, 35 (05) : 4099 - 4131
  • [5] Bicycle dynamics and control
    Åström, KJ
    Klein, RE
    Lennartsson, A
    [J]. IEEE CONTROL SYSTEMS MAGAZINE, 2005, 25 (04): : 26 - 47
  • [6] Baltes J., 2022, 2022 INT C ADV ROB I, P1
  • [7] Human Inspired Control of a Small Humanoid Robot in Highly Dynamic Environments or Jimmy Darwin Rocks the Bongo Board
    Baltes, Jacky
    Iverach-Brereton, Chris
    Anderson, John
    [J]. ROBOCUP 2014: ROBOT WORLD CUP XVIII, 2015, 8992 : 466 - 477
  • [8] Chun-Feng Huang, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P5858, DOI 10.1109/ICRA.2017.7989689
  • [9] Balancing control of a bicycle-riding humanoid robot with center of gravity estimation
    Huang, Chun-Feng
    Tung, Yen-Chun
    Lu, Hao-Tien
    Yeh, T. -J.
    [J]. ADVANCED ROBOTICS, 2018, 32 (17) : 918 - 929
  • [10] Iverach-Brereton Chris, 2012, Advances in Autonomous Robotics. Joint Proceedings of the 13th Annual TAROS Conference and the 15th Annual FIRA RoboWorld Congress, P209, DOI 10.1007/978-3-642-32527-4_19