Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

被引:26
作者
Wang, Zhenyu [1 ]
Cai, Bin [1 ]
Li, Jun [1 ]
Yang, Deheng [2 ]
Zhao, Yang [1 ]
Xie, Huan [1 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
关键词
Non-permutation flow-shop scheduling  problem; Deep reinforcement learning; Long short-term memory; Temporal difference; HEURISTIC ALGORITHM; DECISION-MAKING; M-MACHINE; MAKESPAN;
D O I
10.1016/j.cor.2022.106095
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.
引用
收藏
页数:13
相关论文
共 53 条
[1]   The Non-Permutation Flow-Shop scheduling problem: A literature review [J].
Alejandro Rossit, Daniel ;
Tohme, Fernando ;
Frutos, Mariano .
OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2018, 77 :143-153
[2]   A simple heuristic for m-machine flow-shop and its applications in routing-scheduling problems [J].
Averbakh, I ;
Berman, O .
OPERATIONS RESEARCH, 1999, 47 (01) :165-170
[3]  
Baker KR, 1974, Introduction to Sequencing and Scheduling
[4]   Two simple and effective heuristics for minimizing the makespan in non-permutation flow shops [J].
Benavides, Alexander J. ;
Ritt, Marcus .
COMPUTERS & OPERATIONS RESEARCH, 2016, 66 :160-169
[5]   A proposed genetic algorithm coding for flow-shop scheduling problems [J].
Boukef, Hela ;
Benrejeb, Mohamed ;
Borne, Pierre .
INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2007, 2 (03) :229-240
[6]  
Campbell HerbertG., 1970, Management science, V16, pB
[7]  
Cesar Y., 2019, REV IB RICA SIST TEC, VE18, P257
[8]   Permutation flow shop scheduling with earliness and tardiness penalties [J].
Chandra, Pankaj ;
Mehta, Peeyush ;
Tirupati, Devanath .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2009, 47 (20) :5591-5610
[9]   A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem [J].
Chen, Ronghua ;
Yang, Bo ;
Li, Shi ;
Wang, Shilong .
COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 149
[10]   Benchmarks for shop scheduling problems [J].
Demirkol, E ;
Mehta, S ;
Uzsoy, R .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1998, 109 (01) :137-141