Reinforcement learning for online optimization of job-shop scheduling in a smart manufacturing factory

被引：22

作者：

Zhou, Tong ^{[1
]}

Zhu, Haihua ^{[1
]}

Tang, Dunbing ^{[1
]}

Liu, Changchun ^{[1
]}

Cai, Qixiang ^{[1
]}

Shi, Wei ^{[1
]}

Gui, Yong ^{[1
]}

机构：

[1] Nanjing Univ Aeronaut & Astronaut, Coll Mech & Elect Engn, 29 Yudao St, Nanjing 210016, Peoples R China

来源：

ADVANCES IN MECHANICAL ENGINEERING | 2022年 / 14卷 / 03期

基金：

中国国家自然科学基金;

关键词：

Job shop; online scheduling; multi-objective optimization; composite reward; reinforcement learning; SYSTEM; GAME; ALGORITHM; GO;

D O I：

10.1177/16878132221086120

中图分类号：

O414.1 [热力学];

学科分类号：

摘要：

The job-shop scheduling problem (JSSP) is a complex combinatorial problem, especially in dynamic environments. Low-volume-high-mix orders contain various design specifications that bring a large number of uncertainties to manufacturing systems. Traditional scheduling methods are limited in handling diverse manufacturing resources in a dynamic environment. In recent years, artificial intelligence (AI) arouses the interests of researchers in solving dynamic scheduling problems. However, it is difficult to optimize the scheduling policies for online decision making while considering multiple objectives. Therefore, this paper proposes a smart scheduler to handle real-time jobs and unexpected events in smart manufacturing factories. New composite reward functions are formulated to improve the decision-making abilities and learning efficiency of the smart scheduler. Based on deep reinforcement learning (RL), the smart scheduler autonomously learns to schedule manufacturing resources in real time and improve its decision-making abilities dynamically. We evaluate and validate the proposed scheduling model with a series of experiments on a smart factory testbed. Experimental results show that the smart scheduler not only achieves good learning and scheduling performances by optimizing the composite reward functions, but also copes with unexpected events (e.g. urgent or simultaneous orders, machine failures) and balances between efficiency and profits.

引用

页数：19

共 42 条

[1]

[Anonymous], 1989, Learning from delayed rewards: A foundation of reinforcement learning

[2] Artificial Cognition in Production Systems [J].