Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

被引:18
作者
Wang, Zhenyu [1 ]
Cai, Bin [1 ]
Li, Jun [1 ]
Yang, Deheng [2 ]
Zhao, Yang [1 ]
Xie, Huan [1 ]
机构
[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China
[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China
关键词
Non-permutation flow-shop scheduling  problem; Deep reinforcement learning; Long short-term memory; Temporal difference; HEURISTIC ALGORITHM; DECISION-MAKING; M-MACHINE; MAKESPAN;
D O I
10.1016/j.cor.2022.106095
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.
引用
收藏
页数:13
相关论文
共 53 条
  • [31] Schaul T, 2016, Arxiv, DOI arXiv:1511.05952
  • [32] Schulman J, 2017, Arxiv, DOI [arXiv:1707.06347, DOI 10.48550/ARXIV.1707.06347]
  • [33] Shengluo Yang, 2021, 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), P2672, DOI 10.1109/IAEAC50856.2021.9390893
  • [34] Stefan P., 2003, Production Systems and Information Engineering, P83
  • [35] Sundermeyer M, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P194
  • [36] Sutton RS, 2018, ADAPT COMPUT MACH LE, P1
  • [37] van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094
  • [38] Viloria A., 2019, INT C INTELLIGENT CO, P20
  • [39] Wang Y., 2021, IEEE T IND INFORM
  • [40] An effective dynamic service composition reconfiguration approach when service exceptions occur in real-life cloud manufacturing
    Wang, Yankai
    Wang, Shilong
    Kang, Ling
    Wang, Sibao
    [J]. ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2021, 71