Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

被引：26

作者：

Wang, Zhenyu ^{[1
]}

Cai, Bin ^{[1
]}

Li, Jun ^{[1
]}

Yang, Deheng ^{[2
]}

Zhao, Yang ^{[1
]}

Xie, Huan ^{[1
]}

机构：

[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China

[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China

来源：

COMPUTERS & OPERATIONS RESEARCH | 2023年 / 151卷

关键词：

Non-permutation flow-shop scheduling  problem; Deep reinforcement learning; Long short-term memory; Temporal difference; HEURISTIC ALGORITHM; DECISION-MAKING; M-MACHINE; MAKESPAN;

D O I：

10.1016/j.cor.2022.106095

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.

引用

页数：13

共 53 条

[31]

Schaul T, 2016, Arxiv, DOI arXiv:1511.05952

[32]

Schulman J, 2017, Arxiv, DOI arXiv:1707.06347

[33]

Shengluo Yang, 2021, 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), P2672, DOI 10.1109/IAEAC50856.2021.9390893

[34]

Stefan P., 2003, Production Systems and Information Engineering, P83

[35]

Sundermeyer M, 2012, 13TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2012 (INTERSPEECH 2012), VOLS 1-3, P194

[36]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

[37]

van Hasselt H, 2016, AAAI CONF ARTIF INTE, P2094

[38]

Viloria A., 2019, INT C INTELLIGENT CO, P20

[39]

Wang Y., 2021, IEEE T IND INFORM

[40] An effective dynamic service composition reconfiguration approach when service exceptions occur in real-life cloud manufacturing [J].

Wang, Yankai ;

Wang, Shilong ;

Kang, Ling ;

Wang, Sibao .

ROBOTICS AND COMPUTER-INTEGRATED MANUFACTURING, 2021, 71

← 1 2 3 4 5 6 →