Reinforcement learning applied to production planning and control

被引：73

作者：

Esteso, Ana ^{[1
]}

Peidro, David ^{[2
]}

Mula, Josefa ^{[2
]}

Diaz-Madronero, Manuel ^{[2
]}

机构：

[1] Univ Politecn Valencia, Res Ctr Prod Management & Engn CIGIP, Valencia, Spain

[2] Univ Politecn Valencia, Res Ctr Prod Management & Engn CIGIP, C Alarcon 1, Alicante, Spain

来源：

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH | 2023年 / 61卷 / 16期

关键词：

Artificial intelligence; machine learning; reinforcement learning; deep reinforcement learning; production planning and control; industry; 4; 0; NETWORKS;

D O I：

10.1080/00207543.2022.2104180

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The objective of this paper is to examine the use and applications of reinforcement learning (RL) techniques in the production planning and control (PPC) field addressing the following PPC areas: facility resource planning, capacity planning, purchase and supply management, production scheduling and inventory management. The main RL characteristics, such as method, context, states, actions, reward and highlights, were analysed. The considered number of agents, applications and RL software tools, specifically, programming language, platforms, application programming interfaces and RL frameworks, among others, were identified, and 181 articles were sreviewed. The results showed that RL was applied mainly to production scheduling problems, followed by purchase and supply management. The most revised RL algorithms were model-free and single-agent and were applied to simplified PPC environments. Nevertheless, their results seem to be promising compared to traditional mathematical programming and heuristics/metaheuristics solution methods, and even more so when they incorporate uncertainty or non-linear properties. Finally, RL value-based approaches are the most widely used, specifically Q-learning and its variants and for deep RL, deep Q-networks. In recent years however, the most widely used approach has been the actor-critic method, such as the advantage actor critic, proximal policy optimisation, deep deterministic policy gradient and trust region policy optimisation.

引用

页码：5772 / 5789

页数：18

共 96 条

[21]

Guadarrama S., 2018, TF-Agents: A Library for Reinforcement Learning in TensorFlow

[22]

Haarnoja T., 2018, PREPRINT

[23]

Han M., 2018, REINFORCEMENT LEARNI

[24]

Hessel M, 2018, AAAI CONF ARTIF INTE, P3215

[25]

Hill Ashley, 2018, Stable baselines

[26]

Hoffman Matt, 2020, arXiv

[27]

Huang J, 2019, IEEE INT CON AUTO SC, P523, DOI [10.1109/COASE.2019.8843338, 10.1109/coase.2019.8843338]

[28]

Hubbs C.D., 2020, Or-gym: A reinforcement learning library for operations research problems

[29] A deep reinforcement learning approach for chemical production scheduling [J].

Hubbs, Christian D. ;

Li, Can ;

Sahinidis, Nikolaos, V ;

Grossmann, Ignacio E. ;

Wassick, John M. .

COMPUTERS & CHEMICAL ENGINEERING, 2020, 141

[30] Researchers' perspectives on Industry 4.0: multi-disciplinary analysis and opportunities for operations management [J].

Ivanov, Dmitry ;

Tang, Christopher S. ;

Dolgui, Alexandre ;

Battini, Daria ;

Das, Ajay .

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2021, 59 (07) :2055-2078

← 1 2 3 4 5 6 7 8 9 10 →