Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

被引:18
作者
Wang, Zhenyu [1 ]
Cai, Bin [1 ]
Li, Jun [1 ]
Yang, Deheng [2 ]
Zhao, Yang [1 ]
Xie, Huan [1 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
关键词
Non-permutation flow-shop scheduling  problem; Deep reinforcement learning; Long short-term memory; Temporal difference; HEURISTIC ALGORITHM; DECISION-MAKING; M-MACHINE; MAKESPAN;
D O I
10.1016/j.cor.2022.106095
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.
引用
收藏
页数:13
相关论文
共 53 条
  • [1] The Non-Permutation Flow-Shop scheduling problem: A literature review
    Alejandro Rossit, Daniel
    Tohme, Fernando
    Frutos, Mariano
    [J]. OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2018, 77 : 143 - 153
  • [2] A simple heuristic for m-machine flow-shop and its applications in routing-scheduling problems
    Averbakh, I
    Berman, O
    [J]. OPERATIONS RESEARCH, 1999, 47 (01) : 165 - 170
  • [3] Baker KR., 1974, INTRO SEQUENCING SCH
  • [4] Two simple and effective heuristics for minimizing the makespan in non-permutation flow shops
    Benavides, Alexander J.
    Ritt, Marcus
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2016, 66 : 160 - 169
  • [5] A proposed genetic algorithm coding for flow-shop scheduling problems
    Boukef, Hela
    Benrejeb, Mohamed
    Borne, Pierre
    [J]. INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2007, 2 (03) : 229 - 240
  • [6] Campbell HerbertG., 1970, Management science, V16, pB
  • [7] Permutation flow shop scheduling with earliness and tardiness penalties
    Chandra, Pankaj
    Mehta, Peeyush
    Tirupati, Devanath
    [J]. INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2009, 47 (20) : 5591 - 5610
  • [8] A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem
    Chen, Ronghua
    Yang, Bo
    Li, Shi
    Wang, Shilong
    [J]. COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 149
  • [9] Benchmarks for shop scheduling problems
    Demirkol, E
    Mehta, S
    Uzsoy, R
    [J]. EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1998, 109 (01) : 137 - 141
  • [10] A best-of-breed iterated greedy for the permutation flowshop scheduling problem with makespan objective
    Fernandez-Viagas, Victor
    Framinan, Jose M.
    [J]. COMPUTERS & OPERATIONS RESEARCH, 2019, 112