共 53 条
A Double Deep Q-Network framework for a flexible job shop scheduling problem with dynamic job arrivals and urgent job insertions
被引:13
作者:
Lu, Shaojun
[1
,2
,3
]
Wang, Yongqi
[1
]
Kong, Min
[1
,4
]
Wang, Weizhong
[4
]
Tan, Weimin
[4
]
Song, Yingxin
[4
]
机构:
[1] Hefei Univ Technol, Sch Management, Hefei 230009, Peoples R China
[2] Univ Florida, Ctr Appl Optimizat, Dept Ind & Syst Engn, Gainesville, FL USA
[3] Minist Educ, Key Lab Proc Optimizat & Intelligent Decis Making, Hefei 230009, Peoples R China
[4] Anhui Normal Univ, Sch Econ & Management, Wuhu 241000, Peoples R China
基金:
中国博士后科学基金;
中国国家自然科学基金;
关键词:
Semiconductor manufacture;
Dynamic flexible job shop scheduling;
Double deep Q-Network;
Dynamic job arrivals;
Urgent job insertions;
OPTIMIZATION;
SEARCH;
D O I:
10.1016/j.engappai.2024.108487
中图分类号:
TP [自动化技术、计算机技术];
学科分类号:
0812 ;
摘要:
In the semiconductor manufacturing industry, the Dynamic Flexible Job Shop Scheduling Problem is regarded as one of the most complex and significant scheduling problems. Existing studies consider the dynamic arrival of jobs, however, the insertion of urgent jobs such as testing chips poses a challenge to the production model, and there is an urgent need for new scheduling methods to improve the dynamic response and self-adjustment of the shop floor. In this work, deep reinforcement learning is utilized to address the dynamic flexible job shop scheduling problem and facilitate near-real-time shop floor decision-making. We extracted eight state features, including machine utilization, operation completion rate, etc., to reflect real-time shop floor production data. After examining machine availability time, the machine's earliest available time is redefined and incorporated into the design of compound scheduling rules. Eight compound scheduling rules have been developed for job selection and machine allocation. By using the state features as inputs to the Double Deep Q-Network, it is possible to acquire the state action values (Q-values) of each compound scheduling rule, and the intelligent agent can learn a reasonable optimization strategy through training. Simulation studies show that the proposed Double Deep Q-Network algorithm outperforms other heuristics and well-known scheduling rules by generating excellent solutions quickly. In most scenarios, the Double Deep Q-Network algorithm outperforms the Deep QNetwork, Q-Learning, and State-Action-Reward-State-Action (SARSA) frameworks. Moreover, the intelligent agent has good generalization ability in terms of optimization for similar objectives.
引用
收藏
页数:22
相关论文