DDPG Reinforcement Learning Experiment for Improving the Stability of Bipedal Walking of Humanoid Robots

被引:1
作者
Chun, Yeonghun [1 ]
Choi, Junghun [2 ]
Min, Injoon [1 ]
Ahn, Minsung [3 ]
Han, Jeakweon [4 ]
机构
[1] Hanyang Univ, Dept Convergence Robot Syst, Ansan 15588, South Korea
[2] Hanyang Univ, Dept Mechatron Engn, Ansan 15588, South Korea
[3] Univ Calif Los Angeles, Dept Mech & Aerosp Engn, Los Angeles, CA 90095 USA
[4] Hanyang Univ, Dept Robot, Ansan 15588, South Korea
来源
2023 IEEE/SICE INTERNATIONAL SYMPOSIUM ON SYSTEM INTEGRATION, SII | 2023年
关键词
Humanoid and Bipedal Locomotion; Reinforcement Learning; Bipedal Walking;
D O I
10.1109/SII55687.2023.10039306
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
To improve the stability of bipedal walking of humanoid robots, we developed a method of setting trajectory parameters using reinforcement learning on a treadmill like testbed in a real-world environment. A deep deterministic policy gradient (DDPG) was used as the reinforcement learning algorithm. By improving the reward using a zero moment point (ZMP), the optimum value of walking stability and walking speed was determined. The robot was designed to measure the ZMP and mount weights on the upper body. In addition, a treadmill was manufactured to operate at the same speed as the walking speed of the robot. Reinforcement learning was divided into unweighted cases and cases with a weight of 1kg. At approximately 100 min, 300 episodes were performed, and reward improvements of 16.71% and 26.25% reward improvements were made. The ZMP measurements indicated that bipedal walking was performed in a safe area. Therefore, we demonstrated that the biped walking performance of a humanoid robot can be improved by the reinforcement learning of walking speed and ZMP similarity.
引用
收藏
页数:7
相关论文
共 24 条
[1]  
Bloesch Michael, 2022, C ROBOT LEARNING
[2]  
Dingsheng Luo, 2011, 2011 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids 2011), P352, DOI 10.1109/Humanoids.2011.6100850
[3]   SPIKING NEURAL NETWORKS [J].
Ghosh-Dastidar, Samanwoy ;
Adeli, Hojjat .
INTERNATIONAL JOURNAL OF NEURAL SYSTEMS, 2009, 19 (04) :295-308
[4]   Learning an Efficient Gait Cycle of a Biped Robot Based on Reinforcement Learning and Artificial Neural Networks [J].
Gil, Cristyan R. ;
Calvo, Hiram ;
Sossa, Humberto .
APPLIED SCIENCES-BASEL, 2019, 9 (03)
[5]  
Haarnoja T, 2019, Arxiv, DOI arXiv:1812.11103
[6]   Learning agile and dynamic motor skills for legged robots [J].
Hwangbo, Jemin ;
Lee, Joonho ;
Dosovitskiy, Alexey ;
Bellicoso, Dario ;
Tsounis, Vassilios ;
Koltun, Vladlen ;
Hutter, Marco .
SCIENCE ROBOTICS, 2019, 4 (26)
[7]   Biped walking pattern generation by using preview control of zero-moment point [J].
Kajita, S ;
Kanehiro, F ;
Kaneko, K ;
Fujiwara, K ;
Harada, K ;
Yokoi, K ;
Hirukawa, H .
2003 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-3, PROCEEDINGS, 2003, :1620-1626
[8]  
Kato Ichiro., 1974, THEORY PRACTICE ROBO, P11
[9]   On actor-critic algorithms [J].
Konda, VR ;
Tsitsiklis, JN .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) :1143-1166
[10]  
Kormushev P, 2011, IEEE INT C INT ROBOT, P318, DOI 10.1109/IROS.2011.6048037