AGV path planning and task scheduling based on improved proximal policy optimization algorithm

被引：0

作者：

Qi, Xuan ^{[1
]}

Zhou, Tong ^{[2
]}

Wang, Cunsong ^{[2
]}

Peng, Xiaotian ^{[1
]}

Peng, Hao ^{[1
]}

机构：

[1] School of Mechanical and Power Engineering, Nanjing Tech University, Nanjing

[2] Institute of Intelligent Manufacturing, Nanjing Tech University, Nanjing

来源：

Jisuanji Jicheng Zhizao Xitong/Computer Integrated Manufacturing Systems, CIMS | 2025年 / 31卷 / 03期

基金：

中国国家自然科学基金;

关键词：

automated guided vehicle; path planning; proximal policy optimization algorithm; reinforcement learning; task scheduling;

D O I：

10.13196/j.cims.2023.0552

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Automated Guided Vehicle(AGV)is a type of automated material handling equipment with high flexibility and adaptability.The current research on optimal path and scheduling algorithms for AGVs still faces problems such as poor generalization,low convergence efficiency,and long routing time.Therefore,an improved Proximal Policy Optimization(PPO)algorithm was proposed.By adapting a multi-step action selection strategy to increase the step length of AGV movement,the AGV action set was expanded from the original 4 directions by 8 directions for optimizing the optimal path.The dynamic reward function was improved to adjust the reward value in real time based on the current state of AGV for enhancing its learning ability.Then,the reward value curves were compared based on different improvement methods to validate the convergence efficiency of the algorithm and the distance of the optimal path.Finally,by employing a continuous task scheduling optimization algorithm,a novel single AGV continuous task scheduling optimization algorithm had been developed to enhance transportation efficiency.The results showed that the improved algorithm shortened the optimal path by 28.6% and demonstrated a 78.5% increase in convergence efficiency compared to the PPO algorithm.It outperformed in handling more complex tasks that require high-level policies and exhibits stronger generalization capabilities.Compared to Q-Learning,Deep Q-Network(DQN)algorithm and Soft Actor Critical(SAC)algorithm,the improved algorithm showed efficiency improvements of 84.4%,83.7%,and 77.9% respectively.After the optimization of continuous task scheduling for a single AGV,the average path was reduced by 47.6%. © 2025 CIMS. All rights reserved.

引用

页码：955 / 964

页数：9

共 19 条

[11]

MENG Chenyang, HAO Chongqing, LI Ran, Et al., Research on AGV path planning method in complex environments based on improved DDPG algorithm[J], Computer Application Research, 39, 3, pp. 681-687, (2022)

[12]

SCHULMAN J, DHARIWAL P, Et al., Proximal policy optimization algorithm

[13]

HU H T, YANG X R., XIAO S C, Et al., Anti-conflict AGV path planning in automated container terminals based on multi-agent reinforcement learning[J], International Journal of Production Research, 61, 1, pp. 65-80, (2023)

[14]

SUN Aihong, Qi LEI, SONG Yuchuan, Et al., Solving the joint scheduling problem of machine and AGV in job shop based on deep reinforcement learning [J], Control and Decision Making, pp. 1-9, (2023)

[15]

XING X R, DING H W, LIANG Z G, Et al., Robot path plan-ner based on deep reinforcement learning and the seeker optimization algonthm[J], Mechatronics, 88, (2022)

[16]

CHEN H, GUO G, TANG B B, Et al., Data-driven transferred energy management strategy for hybrid electric vehicles via deep reinforcement learning [J], Energy Reports, 10, pp. 2680-2692, (2023)

[17]

HUANG B, WANG J H., Deep-reinforcement-learning-based capacity scheduling for PV-battery storage system[j], IEEE Transactions on Smart Grid, 12, 3, pp. 2272-2283, (2021)

[18]

SONG DaleuLV Kunling, CHEN Xiaoping, Et al., USV cover-age path planning based on deep reinforcement learning[J], Modern Electronics Technique, 45, 22, pp. 1-7, (2022)

[19]

TANG Chao, ZHANG Fan, Robot arm motion planning based on improved SAC algorithm[J], Electronic Technology, 37, 1, pp. 47-54, (2024)

← 1 2 →