Deep reinforcement learning-based spatio-temporal graph neural network for solving job shop scheduling problem

被引：0

作者：

Gebreyesus, Goytom ^{[1
]}

Fellek, Getu ^{[1
]}

Farid, Ahmed ^{[1
]}

Hou, Sicheng ^{[1
]}

Fujimura, Shigeru ^{[1
]}

Yoshie, Osamu ^{[1
]}

机构：

[1] Waseda Univ, Grad Sch Informat Prod & Syst, Fukuoka, Japan

来源：

EVOLUTIONARY INTELLIGENCE | 2025年 / 18卷 / 01期

关键词：

Deep reinforcement learning; Spatio-temporal representation; Job shop scheduling; Graph neural network; MIGRATING BIRDS OPTIMIZATION; ALGORITHM; BENCHMARKS; TIME;

D O I：

10.1007/s12065-024-00989-6

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The job shop scheduling problem (JSSP) is a well-known NP-hard combinatorial optimization problem that focuses on assigning tasks to limited resources while adhering to certain constraints. Currently, deep reinforcement learning (DRL)-based solutions are being widely used to solve the JSSP by defining the problem structure on disjunctive graphs. Some of the proposed approaches attempt to leverage the structural information of the JSSP to capture the dynamics of the environment without considering the time dependency within the JSSP. However, learning graph representations only from the structural relationship of nodes results in a weak and incomplete representation of these graphs which does not provide an expressive representation of the dynamics in the environment. In this study, unlike existing frameworks, we defined the JSSP as a dynamic graph to explicitly consider the time-varying aspect of the JSSP environment. To this end, we propose a novel DRL framework that captures both the spatial and temporal attributes of the JSSP to construct rich and complete graph representations. Our DRL framework introduces a novel attentive graph isomorphism network (Attentive-GIN)-based spatial block to learn the structural relationship and a temporal block to capture the time dependency. Additionally, we designed a gated fusion block that selectively combines the learned representations from the two blocks. We trained the model using the proximal policy optimization algorithm of reinforcement learning. Experimental results show that our trained model exhibits significant performance enhancement compared to heuristic dispatching rules and learning-based solutions for both randomly generated datasets and public benchmarks.

引用

页数：18

共 55 条

[51] ORAD: a new framework of offline Reinforcement Learning with Q-value regularization [J].

Zhang, Longfei ;

Zhang, Yulong ;

Liu, Shixuan ;

Chen, Li ;

Liang, Xingxing ;

Cheng, Guangquan ;

Liu, Zhong .

EVOLUTIONARY INTELLIGENCE, 2024, 17 (01) :339-347

[52] Dynamic Scheduling Method for Job-Shop Manufacturing Systems by Deep Reinforcement Learning with Proximal Policy Optimization [J].

Zhang, Ming ;

Lu, Yang ;

Hu, Youxi ;

Amaitik, Nasser ;

Xu, Yuchun .

SUSTAINABILITY, 2022, 14 (09)

[53] Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning [J].

Zhang, Zhicong ;

Zheng, Li ;

Weng, Michael X. .

INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2007, 34 (9-10) :968-980

[54] Applications of asynchronous deep reinforcement learning based on dynamic updating weights [J].

Zhao, Xingyu ;

Ding, Shifei ;

An, Yuexuan ;

Jia, Weikuan .

APPLIED INTELLIGENCE, 2019, 49 (02) :581-591

[55] Dynamic Jobshop Scheduling Algorithm Based on Deep Q Network [J].

Zhao, Yejian ;

Wang, Yanhong ;

Tan, Yuanyuan ;

Zhang, Jun ;

Yu, Hongxia .

IEEE ACCESS, 2021, 9 :122995-123011

← 1 2 3 4 5 6 →