Dynamic scheduling for flexible job shop using a deep reinforcement learning approach

被引:94
作者
Gui, Yong [1 ]
Tang, Dunbing [1 ]
Zhu, Haihua [1 ]
Zhang, Yi [1 ]
Zhang, Zequn [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Mech & Elect Engn, Nanjing 210000, Peoples R China
基金
中国国家自然科学基金;
关键词
Dynamic flexible job -shop scheduling problem; Single dispatching rule; Markov decision process; Deep reinforcement learning; Deep deterministic policy gradient; MANUFACTURING SYSTEMS; DISPATCHING RULES; GENETIC ALGORITHM; SIMULATION; SELECTION; MACHINES;
D O I
10.1016/j.cie.2023.109255
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Due to the influence of dynamic changes in the manufacturing environment, a single dispatching rule (SDR) cannot consistently attain better results than other rules for dynamic scheduling problems. Dynamic selection of the most appropriate rule from several SDRs based on the Deep Q-Network (DQN) offers better scheduling performance than using an individual SDR. However, the discreteness of action space caused by the DQN and the simplicity of the action as an SDR limit the selection range and restrict performance improvement. Thus, in this paper, we propose a scheduling method based on deep reinforcement learning for the dynamic flexible job-shop scheduling problem (DFJSP), aiming to minimize the mean tardiness. Firstly, a Markov decision process with composite scheduling action is provided to elaborate the flexible job-shop dynamic scheduling process and transform the DFJSP into an RL task. Subsequently, a composite scheduling action aggregated by SDRs and continuous weight variables is designed to provide a continuous rule space and SDR weight selection. Moreover, a reward function related to mean tardiness performance criteria is designed such that maximizing the cumulative reward is equivalent to minimizing the mean tardiness. Finally, a policy network with states as inputs and weights as outputs is constructed to generate the scheduling decision at each decision point. Also, the deep deterministic policy gradient (DDPG) algorithm is used to train the policy network to select the most appropriate weights at each decision point, thereby aggregating the SDRs into a better rule. Results from numerical experiments reveal that the proposed scheduling method achieves significantly better scheduling results than an SDR and the DQN-based method in dynamically changeable manufacturing environments.
引用
收藏
页数:13
相关论文
共 48 条
[1]   Accelerated simulated annealing algorithm applied to the flexible job shop scheduling problem [J].
Antonio Cruz-Chavez, Marco ;
Martinez-Rangel, Martin G. ;
Cruz-Rosales, Martin H. .
INTERNATIONAL TRANSACTIONS IN OPERATIONAL RESEARCH, 2017, 24 (05) :1119-1137
[2]   Dynamic job-shop scheduling using reinforcement learning agents [J].
Aydin, ME ;
Öztemel, E .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2000, 33 (2-3) :169-178
[3]   SEQUENCING RULES AND DUE-DATE ASSIGNMENTS IN A JOB SHOP [J].
BAKER, KR .
MANAGEMENT SCIENCE, 1984, 30 (09) :1093-1104
[4]   A STATE-OF-THE-ART SURVEY OF DISPATCHING RULES FOR MANUFACTURING JOB SHOP OPERATIONS [J].
BLACKSTONE, JH ;
PHILLIPS, DT ;
HOGG, GL .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 1982, 20 (01) :27-45
[5]  
Chen C., 2021, Journal of Marine Science and Engineering, V9
[6]   Auto-bias selection for developing learning-based scheduling systems [J].
Chen, CC ;
Yih, Y ;
Wu, YC .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 1999, 37 (09) :1987-2002
[7]   A study of the flexible job shop scheduling problem with parallel machines and reentrant process [J].
Chen, J. C. ;
Chen, K. H. ;
Wu, J. J. ;
Chen, C. W. .
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2008, 39 (3-4) :344-354
[8]   A ROBUST ADAPTIVE SCHEDULER FOR AN INTELLIGENT WORKSTATION CONTROLLER [J].
CHO, H ;
WYSK, RA .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 1993, 31 (04) :771-789
[9]   A priority scheduling approach for flexible job shops with multiple process plans [J].
Doh, Hyoung-Ho ;
Yu, Jae-Min ;
Kim, Ji-Su ;
Lee, Dong-Ho ;
Nam, Sung-Ho .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2013, 51 (12) :3748-3764
[10]  
FUJIMOTO H, 1995, IEEE INT CONF ROBOT, P190, DOI 10.1109/ROBOT.1995.525284