共 50 条
A multi-level action coupling reinforcement learning approach for online two-stage flexible assembly flow shop scheduling
被引:0
|作者:
Qiu, Junhao
[1
]
Liu, Jianjun
[1
]
Li, Zhantao
[1
]
Lai, Xinjun
[1
]
机构:
[1] Guangdong Univ Technol, Guangdong Prov Key Lab Comp Integrated Mfg Syst, State Key Lab Precis Elect Mfg Technol & Equipment, Guangzhou 510006, Guangdong, Peoples R China
关键词:
Flexible assembly flow shop;
Multi-product delivery;
Online scheduling;
Reinforcement learning;
Multi-level rule combinations;
Asynchronous execution;
BOUND ALGORITHM;
SETUP TIMES;
MINIMIZE;
SYSTEM;
D O I:
暂无
中图分类号:
T [工业技术];
学科分类号:
08 ;
摘要:
Multi-product centralized delivery and kitting assembly present significant challenges to hierarchical co- processing in multi-stage manufacturing systems. The combinations of priority dispatching rules at each level are transiently adaptive, and the performance in online scheduling deteriorates rapidly with changing environment. This paper investigates the selection of rule combinations for sustained high-performance responsive scheduling in two-stage flexible assembly flow shop scheduling problem with asynchronous execution and complex decision correlation. A Multi-Level Action Coupling Deep Q-Network (MALC-DQN) approach is proposed for adaptive integrated scheduling in hybrid processing and assembly shops. Firstly, the problem is skillfully established as an event-triggered integrated decision markov decision process. The prioritized batch experience replay mechanism is employed to retain the complete correlation information of key decision sequences. Then, coupling and sequence feature extraction modules are developed to enhance the agent's ability to perceive execution process and the environment. Furthermore, the multi-level wait- limit mechanism and efficient action filtering mechanism are designed to mitigate ineffective waiting waste and action space explosion during learning. Finally, a series of sophisticated experiments are conducted to validate the effectiveness of the proposed methodology. In 20 actual instances of different sizes, MLAC-DQN outperformed its closest competitor, with a 26.6% improvement in average tardiness. Moreover, extraordinary robustness is demonstrated in 16 sets of experiments involving different configurations of resources, orders, and arrival concentration levels.
引用
收藏
页码:370 / 370
页数:1
相关论文