Research on Motion Control Method of Manipulator Based on Reinforcement Learning

被引:1
作者
Yang, Bo [1 ]
Wang, Kun [1 ,2 ]
Ma, Xiangxiang [1 ]
Fan, Biao [1 ]
Xu, Lei [1 ]
Yan, Hao [1 ]
机构
[1] School of Mechanical Engineering, Jiangnan University, Jiangsu, Wuxi
[2] Institute of Intelligent Machines, Chinese Academy of Sciences, Hefei
关键词
agent; M-PPO algorithm; motion control; multi-intelligence; reinforcement learning; Unity engine;
D O I
10.3778/j.issn.1002-8331.2207-0159
中图分类号
学科分类号
摘要
The traditional motion control algorithm has the problems of poor environmental adaptability and low efficiency. Reinforcement learning can be used to constantly explore trial and error in the environment, and the motion of the manipulator can be controlled by adjusting the neural network parameters through the reward function. However, in reality, it is impossible to provide a trial and error environment for the manipulator. This paper uses the Unity engine platform to build a digital twin simulation environment for the manipulator, set the observation state variables and set the reward function mechanism, and proposes the M-PPO algorithm combining PPO(proximal policy optimization)and multi-agent (agents)in this model environment to speed up the training speed and realize intelligent motion control of the manipulator through reinforcement learning algorithms. This paper completes the effective obstacle avoidance at the end of the manipulator’s execution and reach the target object’s position quickly, and also analyzes the experimental results of the algorithm, M-SAC (multi-agent and soft actor critical)and PPO algorithm. The effectiveness and progressiveness of M-PPO algorithm is verified in the debugging of the manipulator’s motion control decision under different environments. It achieves the purpose of independent planning and decision-making of twins and reverse control of synchronous movement of physical bodies. © 2023 Journal of Computer Engineering and Applications Beijing Co., Ltd. All rights reserved.
引用
收藏
页码:318 / 325
页数:7
相关论文
共 19 条
[1]  
POOR P, BASL J., Role of collaborative robots in industry 4.0 with target on education in industrial engineering[C], 2019 4th International Conference on Control,Robotics and Cybernetics(CRC), pp. 42-46, (2019)
[2]  
WU J,, HE H, PENG J, Et al., Continuous reinforcement learning of energy management with deep Q network for a power split hybrid electric bus[J], Applied Energy, 222, pp. 799-811, (2018)
[3]  
SCHULMAN J, LEVINE S, MORITZ P, Et al., Trust region policy optimization[J]arXiv:1502.05477, (2015)
[4]  
ZHANG Y, DENG Z, GAO Y., Angle of arrival passive location algorithm based on proximal policy optimization[J], Electronics, 8, 12, (2019)
[5]  
HAARNOJA T, ZHOU A, ABBEEL P, Et al., Soft actor-critic:off-policy maximum entropy deep reinforcement learning with a stochastic actor[J], (2018)
[6]  
MORALES E F, ZARAGOZA J H., An introduction to reinforcement learning[J], IEEE, 11, 4, pp. 219-354, (2011)
[7]  
LUONG N C, HOANG D T, GONG S,, Et al., Applications of deep reinforcement learning in communications and networking:a survey[J], IEEE Communications Surveys & Tutorials, 21, 4, pp. 3133-3174, (2019)
[8]  
LIU Y, WU Z,, Et al., Multiobjective preimpact trajectory planning of space manipulator for self-assembling a heavy payload:[J], International Journal of Advanced Robotic Systems, 18, 1, pp. 1-26, (2021)
[9]  
WANG J., Analysis and design of a k-winners-take-all model with a single state variable and the heaviside step activation function[J], IEEE Transactions on Neural Networks, 21, 9, pp. 1496-1506, (2010)
[10]  
KORMUSHEV P, CALINON S, CALDWELL D G., Imitation learning of positional and force skills demonstrated via kinesthetic teaching and haptic input[J], Advanced Robotics, 25, 5, pp. 581-603, (2011)