Risk-averse chain via robust reinforcement

被引：1

作者：

Wang, Jing ^{[1
]}

Swartz, Christopher L. E. ^{[2
]}

Huang, Kai ^{[3
]}

机构：

[1] McMaster Univ, Sch Computat Sci & Engn, 1280 Main St West, Hamilton, ON L8S 4K1, Canada

[2] McMaster Univ, Dept Chem Engn, 1280 Main St West, Hamilton, ON L8S 4L7, Canada

[3] McMaster Univ, DeGroote Sch Business, 1280 Main St West, Hamilton, ON L8S 4M4, Canada

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2025年 / 192卷

关键词：

Supply chain management; Reinforcement learning; Risk management; Worst-case criterion; Closed-loop supply chain; Supply chain simulation; SUPPLY-CHAIN; ORDERING MANAGEMENT; INVENTORY CONTROL; PROCESS SYSTEMS; BIG DATA; OPTIMIZATION; UNCERTAINTY; MODEL; NETWORK;

D O I：

10.1016/j.compchemeng.2024.108912

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, Q-learning and beta-pessimistic Q-learning, are examined against conventional Q-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that Q-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned beta-pessimistic Q-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.

引用

页数：17

共 50 条

[1] A survey on risk-averse and robust revenue management
Goensch, Jochen
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2017, 263 (02) : 337 - 348
[2] A risk-averse distributionally robust project scheduling model to address
Bruni, Maria Elena
Hazir, Oencu
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 318 (02) : 398 - 407
[3] Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Ma, Xiaoteng
Ma, Shuai
Xia, Li
Zhao, Qianchuan
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2022, 75 : 569 - 595
[4] Robust risk-averse unit commitment with solar PV systems
Raygani, Saeid Veysi
Forbes, Michael
Martin, Daniel
IET RENEWABLE POWER GENERATION, 2020, 14 (15) : 2966 - 2975
[5] Channel bargaining with risk-averse retailer
Ma, Lijun
Liu, Fangmei
Li, Sijie
Yan, Houmin
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2012, 139 (01) : 155 - 167
[6] Stackelberg Game of Buyback Policy in Supply Chain with a Risk-Averse Retailer and a Risk-Averse Supplier Based on CVaR
Zhou, Yanju
Chen, Qian
Chen, Xiaohong
Wang, Zongrun
PLOS ONE, 2014, 9 (09):
[7] EFFECTS OF DISRUPTION RISK ON A SUPPLY CHAIN WITH A RISK-AVERSE RETAILER
Li, Min
Zhang, Jiahua
Xu, Yifan
Wang, Wei
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2022, 18 (02) : 1365 - 1391
[8] The coordination mechanism of a risk-averse green supply chain
Wang, Yuhong
Sheng, Xiaoqi
Xie, Yudie
CHINESE MANAGEMENT STUDIES, 2024, 18 (01) : 174 - 195
[9] Pricing and product line strategy in a supply chain with risk-averse players
Xiao, Tiaojun
Xu, Tiantian
INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2014, 156 : 305 - 315
[10] Towards Risk-Averse Edge Computing With Deep Reinforcement Learning
Xu, Dianlei
Su, Xiang
Wang, Huandong
Tarkoma, Sasu
Hui, Pan
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2024, 23 (06) : 7030 - 7047

← 1 2 3 4 5 →