Risk-averse chain via robust reinforcement

被引：3

作者：

Wang, Jing ^{[1
]}

Swartz, Christopher L. E. ^{[2
]}

Huang, Kai ^{[3
]}

机构：

[1] McMaster Univ, Sch Computat Sci & Engn, 1280 Main St West, Hamilton, ON L8S 4K1, Canada

[2] McMaster Univ, Dept Chem Engn, 1280 Main St West, Hamilton, ON L8S 4L7, Canada

[3] McMaster Univ, DeGroote Sch Business, 1280 Main St West, Hamilton, ON L8S 4M4, Canada

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2025年 / 192卷

关键词：

Supply chain management; Reinforcement learning; Risk management; Worst-case criterion; Closed-loop supply chain; Supply chain simulation; SUPPLY-CHAIN; ORDERING MANAGEMENT; INVENTORY CONTROL; PROCESS SYSTEMS; BIG DATA; OPTIMIZATION; UNCERTAINTY; MODEL; NETWORK;

D O I：

10.1016/j.compchemeng.2024.108912

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, Q-learning and beta-pessimistic Q-learning, are examined against conventional Q-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that Q-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned beta-pessimistic Q-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.

引用

页数：17

共 50 条

[41] Robust inventory routing problem under uncertain demand and risk-averse criterion [J].

Feng, Yuqiang ;

Che, Ada ;

Tian, Na .

OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2024, 127

[42] Risk-averse policy optimization via risk-neutral policy optimization [J].

Bisi, Lorenzo ;

Santambrogio, Davide ;

Sandrelli, Federico ;

Tirinzoni, Andrea ;

Ziebart, Brian D. ;

Restelli, Marcello .

ARTIFICIAL INTELLIGENCE, 2022, 311

[43] Combating lead-time uncertainty in global supply chain's shipment-assignment: Is it wise to be risk-averse? [J].

Sun, Xuting ;

Chung, Sai-Ho ;

Choi, Tsan-Ming ;

Sheu, Jiuh-Biing ;

Ma, Hoi Lam .

TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 2020, 138 (138) :406-434

[44] Manufacturer's return policy in a two-stage supply chain with two risk-averse retailers and random demand [J].

Hsieh, Chung-Chi ;

Lu, Yu-Ting .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2010, 207 (01) :514-523

[45] Supply network design: Risk-averse or risk-neutral? [J].

Madadi, AliReza ;

Kurz, Mary E. ;

Taaffe, Kevin M. ;

Sharp, Julia L. ;

Mason, Scott J. .

COMPUTERS & INDUSTRIAL ENGINEERING, 2014, 78 :55-65

[46] Risk-averse preferences in a dual-channel supply chain with trade credit and demand uncertainty [J].

Zhang, Chong ;

Wang, Yaxian ;

Zhang, Leiifan .

RAIRO-OPERATIONS RESEARCH, 2021, 55 :S2879-S2903

[47] Risk-averse toll pricing in a stochastic transportation network [J].

Feyzioglu, Orhan ;

Noyan, Nilay .

EUROPEAN JOURNAL OF INDUSTRIAL ENGINEERING, 2017, 11 (02) :133-167

[48] Risk-Averse Shortest Path Interdiction [J].

Song, Yongjia ;

Shen, Siqian .

INFORMS JOURNAL ON COMPUTING, 2016, 28 (03) :527-539

[49] Channel coordination of a risk-averse supply chain: a mean-variance approach [J].

Biswas, Indranil ;

Adhikari, Arnab ;

Biswas, Baidyanath .

DECISION, 2020, 47 (04) :415-429

[50] Risk-averse stochastic path detection [J].

Collado, Ricardo ;

Meisel, Stephan ;

Priekule, Laura .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2017, 260 (01) :195-211

← 1 2 3 4 5 →