Reinforcement learning applied to production planning and control

被引：73

作者：

Esteso, Ana ^{[1
]}

Peidro, David ^{[2
]}

Mula, Josefa ^{[2
]}

Diaz-Madronero, Manuel ^{[2
]}

机构：

[1] Univ Politecn Valencia, Res Ctr Prod Management & Engn CIGIP, Valencia, Spain

[2] Univ Politecn Valencia, Res Ctr Prod Management & Engn CIGIP, C Alarcon 1, Alicante, Spain

来源：

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH | 2023年 / 61卷 / 16期

关键词：

Artificial intelligence; machine learning; reinforcement learning; deep reinforcement learning; production planning and control; industry; 4; 0; NETWORKS;

D O I：

10.1080/00207543.2022.2104180

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

The objective of this paper is to examine the use and applications of reinforcement learning (RL) techniques in the production planning and control (PPC) field addressing the following PPC areas: facility resource planning, capacity planning, purchase and supply management, production scheduling and inventory management. The main RL characteristics, such as method, context, states, actions, reward and highlights, were analysed. The considered number of agents, applications and RL software tools, specifically, programming language, platforms, application programming interfaces and RL frameworks, among others, were identified, and 181 articles were sreviewed. The results showed that RL was applied mainly to production scheduling problems, followed by purchase and supply management. The most revised RL algorithms were model-free and single-agent and were applied to simplified PPC environments. Nevertheless, their results seem to be promising compared to traditional mathematical programming and heuristics/metaheuristics solution methods, and even more so when they incorporate uncertainty or non-linear properties. Finally, RL value-based approaches are the most widely used, specifically Q-learning and its variants and for deep RL, deep Q-networks. In recent years however, the most widely used approach has been the actor-critic method, such as the advantage actor critic, proximal policy optimisation, deep deterministic policy gradient and trust region policy optimisation.

引用

页码：5772 / 5789

页数：18

共 96 条

[1]

Abadi Martin, 2016, Proceedings of OSDI '16: 12th USENIX Symposium on Operating Systems Design and Implementation. OSDI '16, P265

[2] Deep Reinforcement Learning and Optimization Approach for Multi-echelon Supply Chain with Uncertain Demands [J].

Alves, Julio Cesar ;

Mateus, Geraldo Robson .

COMPUTATIONAL LOGISTICS, ICCL 2020, 2020, 12433 :584-599

[3]

Barat S, 2019, AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, P1802

[4] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[5] Projective simulation for artificial intelligence [J].

Briegel, Hans J. ;

De las Cuevas, Gemma .

SCIENTIFIC REPORTS, 2012, 2

[6]

Brockman Greg, 2016, arXiv

[7] Smart production planning and control in the Industry 4.0 context: A systematic literature review [J].

Bueno, Adauto ;

Godinho Filho, Moacir ;

Frank, Alejandro G. .

COMPUTERS & INDUSTRIAL ENGINEERING, 2020, 149

[8] Implementing Industry 4.0 principles [J].

Canas, Hector ;

Mula, Josefa ;

Diaz-Madronero, Manuel ;

Campuzano-Bolarin, Francisco .

COMPUTERS & INDUSTRIAL ENGINEERING, 2021, 158 (158)

[9] A General Outline of a Sustainable Supply Chain 4.0 [J].

Canas, Hector ;

Mula, Josefa ;

Campuzano-Bolarin, Francisco .

SUSTAINABILITY, 2020, 12 (19) :1-17

[10] Multi-Agent Reinforcement Learning: A Review of Challenges and Applications [J].

Canese, Lorenzo ;

Cardarilli, Gian Carlo ;

Di Nunzio, Luca ;

Fazzolari, Rocco ;

Giardino, Daniele ;

Re, Marco ;

Spano, Sergio .

APPLIED SCIENCES-BASEL, 2021, 11 (11)

← 1 2 3 4 5 6 7 8 9 10 →