A Multiobjective Collaborative Deep Reinforcement Learning Algorithm for Jumping Optimization of Bipedal Robot

被引：2

作者：

Tao, Chongben ^{[1
,2
]}

Li, Mengru ^{[1
]}

Cao, Feng ^{[3
]}

Gao, Zhen ^{[4
]}

Zhang, Zufeng ^{[5
]}

机构：

[1] Suzhou Univ Sci & Technol, Sch Elect & Informat Engn, Suzhou 215009, Peoples R China

[2] Tsinghua Univ, Suzhou Automobile Res Inst, Suzhou 215134, Peoples R China

[3] Shanxi Univ, Sch Comp & Informat Technol, Taiyuan 030006, Peoples R China

[4] McMaster Univ, Fac Engn, Hamilton, ON L8S 0A, Canada

[5] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

ADVANCED INTELLIGENT SYSTEMS | 2024年 / 6卷 / 01期

基金：

中国国家自然科学基金;

关键词：

bipedal robot; collaborative learning; deep reinforcement learning; experience replay mechanism; jumping;

D O I：

10.1002/aisy.202300352

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Due to the nonlinearity and underactuation of bipedal robots, developing efficient jumping strategies remains challenging. To address this, a multiobjective collaborative deep reinforcement learning algorithm based on the actor-critic framework is presented. Initially, two deep deterministic policy gradient (DDPG) networks are established for training the jumping motion, each focusing on different objectives and collaboratively learning the optimal jumping policy. Following this, a recovery experience replay mechanism, predicated on dynamic time warping, is integrated into the DDPG to enhance sample utilization efficiency. Concurrently, a timely adjustment unit is incorporated, which works in tandem with the training frequency to improve the convergence accuracy of the algorithm. Additionally, a Markov decision process is designed to manage the complexity and parameter uncertainty in the dynamic model of the bipedal robot. Finally, the proposed method is validated on a PyBullet platform. The results show that the method outperforms baseline methods by improving learning speed and enabling robust jumps with greater height and distance. A multiobjective collaborative deep reinforcement learning approach is presented to develop efficient jumping strategies for bipedal robots. By integrating dual networks, experience replay, timely adjustment, and a Markov decision process, the method enables bipedal robots to learn robust policies and execute jumps with extended height and distance, outperforming baseline algorithms.image (c) 2023 WILEY-VCH GmbH

引用

页数：10

共 33 条

[1] Optimal Standing Jump Trajectory Generation for Biped Robots
Ahn, DongHyun
Cho, Baek-Kyu
[J]. INTERNATIONAL JOURNAL OF PRECISION ENGINEERING AND MANUFACTURING, 2020, 21 (08) : 1459 - 1467
[2] Batke R, 2022, IEEE-RAS INT C HUMAN, P714, DOI 10.1109/Humanoids53995.2022.9999741
[3] Robust High-Speed Running for Quadruped Robots via Deep Reinforcement Learning
Bellegarda, Guillaume
Chen, Yiyu
Liu, Zhuochen
Quan Nguyen
[J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 10364 - 10370
[4] Bellegarda Q., 2020, ARXIV
[5] Trajectory Optimization With Implicit Hard Contacts
Carius, Jan
Ranftl, Rene
Koltun, Vladlen
Hutter, Marco
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3316 - 3323
[6] Underactuated Motion Planning and Control for Jumping With Wheeled-Bipedal Robots
Chen, Hua
Wang, Bingheng
Hong, Zejun
Shen, Cong
Wensing, Patrick M.
Zhang, Wei
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 747 - 754
[7] Hybrid Sampling/Optimization-based Planning for Agile Jumping Robots on Challenging Terrains
Ding, Yanran
Zhang, Mengchao
Li, Chuanzheng
Park, Hae-Won
Hauser, Kris
[J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 2839 - 2845
[8] Sim-to-Real Learning of Footstep-Constrained Bipedal Dynamic Walking
Duan, Helei
Malik, Ashish
Dao, Jeremy
Saxena, Aseem
Green, Kevin
Siekmann, Jonah
Fern, Alan
Hurst, Jonathan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 10428 - 10434
[9] Eknath J. A., 2018, THESIS INDIAN I TECH
[10] Adversarial Motion Priors Make Good Substitutes for Complex Reward Functions
Escontrela, Alejandro
Peng, Xue Bin
Yu, Wenhao
Zhang, Tingnan
Iscen, Atil
Goldberg, Ken
Abbeel, Pieter
[J]. 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2022, : 25 - 32

← 1 2 3 4 →