A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot

被引:2
作者
Tao, Chongben [1 ,2 ]
Li, Mengru [1 ]
Cao, Feng [3 ]
Gao, Zhen [4 ]
Zhang, Zufeng [5 ]
机构
[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou 215009, Peoples R China
[2] Tsinghua Univ, Suzhou Automobile Res Inst, Suzhou 215134, Peoples R China
[3] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China
[4] McMaster Univ, Fac Engn, Hamilton, ON L8S 0A, Canada
[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
基金
中国国家自然科学基金;
关键词
bipedal robot; collaborative learning; deep reinforcement learning; experience replay mechanism; jumping;
D O I
10.1002/aisy.202300352
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Due to the nonlinearity and underactuation of bipedal robots, developing efficient jumping strategies remains challenging. To address this, a multiobjective collaborative deep reinforcement learning algorithm based on the actor-critic framework is presented. Initially, two deep deterministic policy gradient (DDPG) networks are established for training the jumping motion, each focusing on different objectives and collaboratively learning the optimal jumping policy. Following this, a recovery experience replay mechanism, predicated on dynamic time warping, is integrated into the DDPG to enhance sample utilization efficiency. Concurrently, a timely adjustment unit is incorporated, which works in tandem with the training frequency to improve the convergence accuracy of the algorithm. Additionally, a Markov decision process is designed to manage the complexity and parameter uncertainty in the dynamic model of the bipedal robot. Finally, the proposed method is validated on a PyBullet platform. The results show that the method outperforms baseline methods by improving learning speed and enabling robust jumps with greater height and distance. A multiobjective collaborative deep reinforcement learning approach is presented to develop efficient jumping strategies for bipedal robots. By integrating dual networks, experience replay, timely adjustment, and a Markov decision process, the method enables bipedal robots to learn robust policies and execute jumps with extended height and distance, outperforming baseline algorithms.image (c) 2023 WILEY-VCH GmbH
引用
收藏
页数:10
相关论文
共 33 条
  • [1] Optimal Standing Jump Trajectory Generation for Biped Robots
    Ahn, DongHyun
    Cho, Baek-Kyu
    [J]. INTERNATIONAL JOURNAL OF PRECISION ENGINEERING AND MANUFACTURING, 2020, 21 (08) : 1459 - 1467
  • [2] Batke R, 2022, IEEE-RAS INT C HUMAN, P714, DOI 10.1109/Humanoids53995.2022.9999741
  • [3] Robust High-Speed Running for Quadruped Robots via Deep Reinforcement Learning
    Bellegarda, Guillaume
    Chen, Yiyu
    Liu, Zhuochen
    Quan Nguyen
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 10364 - 10370
  • [4] Bellegarda Q., 2020, ARXIV
  • [5] Trajectory Optimization With Implicit Hard Contacts
    Carius, Jan
    Ranftl, Rene
    Koltun, Vladlen
    Hutter, Marco
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3316 - 3323
  • [6] Underactuated Motion Planning and Control for Jumping With Wheeled-Bipedal Robots
    Chen, Hua
    Wang, Bingheng
    Hong, Zejun
    Shen, Cong
    Wensing, Patrick M.
    Zhang, Wei
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 747 - 754
  • [7] Hybrid Sampling/Optimization-based Planning for Agile Jumping Robots on Challenging Terrains
    Ding, Yanran
    Zhang, Mengchao
    Li, Chuanzheng
    Park, Hae-Won
    Hauser, Kris
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 2839 - 2845
  • [8] Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking
    Duan, Helei
    Malik, Ashish
    Dao, Jeremy
    Saxena, Aseem
    Green, Kevin
    Siekmann, Jonah
    Fern, Alan
    Hurst, Jonathan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 10428 - 10434
  • [9] Eknath J. A., 2018, THESIS INDIAN I TECH
  • [10] Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions
    Escontrela, Alejandro
    Peng, Xue Bin
    Yu, Wenhao
    Zhang, Tingnan
    Iscen, Atil
    Goldberg, Ken
    Abbeel, Pieter
    [J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 25 - 32