Intelligent proximal-policy-optimization-based decision-making system for humanoid robots

被引：5

作者：

Kuo, Ping-Huan ^{[1
,2
]}

Yang, Wei-Cyuan ^{[2
]}

Hsu, Po-Wei ^{[1
]}

Chen, Kuan-Lin ^{[3
]}

机构：

[1] Natl Chung Cheng Univ, Dept Mech Engn, Chiayi 62102, Taiwan

[2] Natl Chung Cheng Univ, Adv Inst Mfg High tech Innovat AIM HI, Chiayi 62102, Taiwan

[3] Natl Pingtung Univ, Dept Intelligent Robot, Pingtung 90004, Taiwan

来源：

ADVANCED ENGINEERING INFORMATICS | 2023年 / 56卷

关键词：

Humanoid robot; InfoGAN; Deep reinforcement learning; Decision making; Gait pattern generator; DCGAN; RECOGNITION;

D O I：

10.1016/j.aei.2023.102009

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

With the advancements in technology, robots have gradually replaced humans in different aspects. Allowing robots to handle multiple situations simultaneously and perform different actions depending on the situation has since become a critical topic. Currently, training a robot to perform a designated action is considered an easy task. However, when a robot is required to perform actions in different environments, both resetting and retraining are required, which are time-consuming and inefficient. Therefore, allowing robots to autonomously identify their environment can significantly reduce the time consumed. How to employ machine learning al-gorithms to achieve autonomous robot learning has formed a research trend in current studies. In this study, to solve the aforementioned problem, a proximal policy optimization algorithm was used to allow a robot to conduct self-training and select an optimal gait pattern to reach its destination successfully. Multiple basic gait patterns were selected, and information-maximizing generative adversarial nets were used to generate gait patterns and allow the robot to choose from numerous gait patterns while walking. The experimental results indicated that, after self-learning, the robot successfully made different choices depending on the situation, verifying this approach's feasibility.

引用

页数：10

共 37 条

[1]

ABREU M, 2019, 2019 IEEE INT C AUT, P1, DOI [DOI 10.1109/icarsc.2019.8733632, DOI 10.1109/ICARSC.2019.8733632]

[2] An Improved Capsule Network (WaferCaps) for Wafer Bin Map Classification Based on DCGAN Data Upsampling [J].

Abu Ebayyeh, Abd Al Rahman M. ;

Danishvar, Sebelan ;

Mousavi, Alireza .

IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2022, 35 (01) :50-59

[3]

[Anonymous], 2022, EXPT VIDEO

[4]

[Anonymous], 2022, Proximal policy optimization with graph neural networks for optimal power flow

[5]

[Anonymous], 2022, Bullet Real-Time Physics Simulation-Home of Bullet and PyBullet: Physics simulation for games, visual effects, robotics and reinforcement learning

[6] Reconstruction of three-dimension digital rock guided by prior information with a combination of InfoGAN and style-based GAN [J].

Cao, Danping ;

Hou, Zhiyu ;

Liu, Qiang ;

Fu, Feiqi .

JOURNAL OF PETROLEUM SCIENCE AND ENGINEERING, 2022, 208

[7] Dynamic ensemble wind speed prediction model based on hybrid deep reinforcement learning [J].

Chen, Chao ;

Liu, Hui .

ADVANCED ENGINEERING INFORMATICS, 2021, 48

[8]

Chen Xi, 2016, Proceedings of the 30th International Conference on Neural Information Processing Systems, V29

[9] Gesture Recognition Based on CNN and DCGAN for Calculation and Text Output [J].

Fang, Wei ;

Ding, Yewen ;

Zhang, Feihong ;

Sheng, Jack .

IEEE ACCESS, 2019, 7 :28230-28237

[10]

Fujimoto S, 2018, Arxiv, DOI [arXiv:1802.09477, 10.48550/arXiv.1802.09477]

← 1 2 3 4 →