Solving non-permutation flow-shop scheduling problem via a novel deep reinforcement learning approach

被引：26

作者：

Wang, Zhenyu ^{[1
]}

Cai, Bin ^{[1
]}

Li, Jun ^{[1
]}

Yang, Deheng ^{[2
]}

Zhao, Yang ^{[1
]}

Xie, Huan ^{[1
]}

机构：

[1] Chongqing Univ, Sch Big Data & Software Engn, Chongqing 401331, Peoples R China

[2] Natl Univ Def Technol, Changsha 410073, Hunan, Peoples R China

来源：

COMPUTERS & OPERATIONS RESEARCH | 2023年 / 151卷

关键词：

Non-permutation flow-shop scheduling  problem; Deep reinforcement learning; Long short-term memory; Temporal difference; HEURISTIC ALGORITHM; DECISION-MAKING; M-MACHINE; MAKESPAN;

D O I：

10.1016/j.cor.2022.106095

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

The non-permutation flow-shop scheduling problem (NPFS) is studied. We model it as a Markov decision process, creating a massive arena for reinforcement learning (RL) algorithms to work. While RL approaches with function approximation generate a significant number of sequences of highly linked states, few studies have examined the connection between the state sequences but merely shuffled their orders. To this end, this paper proposes a novel deep reinforcement learning algorithm, named LSTM-TD(0), to address NPFS. Specifically, we design fifteen state features to represent a production state at each scheduling point and fourteen actions to choose an unprocessed operation on a given machine. This study applies long short-term memory (LSTM) network to capture the intrinsic connection of the state sequences in RL-based scheduling approaches. Moreover, we enhance the LSTM model with the one-step temporal difference (TD(0)) algorithm to select each action impartially in relation to the state value, avoiding the frequent overestimation of action values in Q-learning. The proposed LSTM-TD(0) was trained using two LSTM networks and enhanced by redesigning the reward value. A series of comparative experiments were conducted between simple heuristic rules, metaheuristic rules, general DRL methods, and LSTM-TD(0) using a group of well-known benchmark problems with different scales. Comparative results have confirmed both the superiority and universality of LSTM-TD(0) over its competitors. Scalability tests reveal that our approach can generalize to instances of different sizes without retraining or knowledge transferring.

引用

页数：13

共 53 条

[1] The Non-Permutation Flow-Shop scheduling problem: A literature review [J].

Alejandro Rossit, Daniel ;

Tohme, Fernando ;

Frutos, Mariano .

OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2018, 77 :143-153

[2] A simple heuristic for m-machine flow-shop and its applications in routing-scheduling problems [J].

Averbakh, I ;

Berman, O .

OPERATIONS RESEARCH, 1999, 47 (01) :165-170

[3]

Baker KR, 1974, Introduction to Sequencing and Scheduling

[4] Two simple and effective heuristics for minimizing the makespan in non-permutation flow shops [J].

Benavides, Alexander J. ;

Ritt, Marcus .

COMPUTERS & OPERATIONS RESEARCH, 2016, 66 :160-169

[5] A proposed genetic algorithm coding for flow-shop scheduling problems [J].

Boukef, Hela ;

Benrejeb, Mohamed ;

Borne, Pierre .

INTERNATIONAL JOURNAL OF COMPUTERS COMMUNICATIONS & CONTROL, 2007, 2 (03) :229-240

[6]

Campbell HerbertG., 1970, Management science, V16, pB

[7]

Cesar Y., 2019, REV IB RICA SIST TEC, VE18, P257

[8] Permutation flow shop scheduling with earliness and tardiness penalties [J].

Chandra, Pankaj ;

Mehta, Peeyush ;

Tirupati, Devanath .

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2009, 47 (20) :5591-5610

[9] A self-learning genetic algorithm based on reinforcement learning for flexible job-shop scheduling problem [J].

Chen, Ronghua ;

Yang, Bo ;

Li, Shi ;

Wang, Shilong .

COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 149

[10] Benchmarks for shop scheduling problems [J].

Demirkol, E ;

Mehta, S ;

Uzsoy, R .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 1998, 109 (01) :137-141

← 1 2 3 4 5 6 →