Risk-averse chain via robust reinforcement

被引：3

作者：

Wang, Jing ^{[1
]}

Swartz, Christopher L. E. ^{[2
]}

Huang, Kai ^{[3
]}

机构：

[1] McMaster Univ, Sch Computat Sci & Engn, 1280 Main St West, Hamilton, ON L8S 4K1, Canada

[2] McMaster Univ, Dept Chem Engn, 1280 Main St West, Hamilton, ON L8S 4L7, Canada

[3] McMaster Univ, DeGroote Sch Business, 1280 Main St West, Hamilton, ON L8S 4M4, Canada

来源：

COMPUTERS & CHEMICAL ENGINEERING | 2025年 / 192卷

关键词：

Supply chain management; Reinforcement learning; Risk management; Worst-case criterion; Closed-loop supply chain; Supply chain simulation; SUPPLY-CHAIN; ORDERING MANAGEMENT; INVENTORY CONTROL; PROCESS SYSTEMS; BIG DATA; OPTIMIZATION; UNCERTAINTY; MODEL; NETWORK;

D O I：

10.1016/j.compchemeng.2024.108912

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Classical reinforcement learning (RL) may suffer performance degradation when the environment deviates from training conditions, limiting its application in risk-averse supply chain management. This work explores using robust RL in supply chain operations to hedge against environment inconsistencies and changes. Two robust RL algorithms, Q-learning and beta-pessimistic Q-learning, are examined against conventional Q-learning and a baseline order-up-to inventory policy. Furthermore, this work extends RL applications from forward to closed-loop supply chains. Two case studies are conducted using a supply chain simulator developed with agent-based modeling. The results show that Q-learning can outperform the baseline policy under normal conditions, but notably degrades under environment deviations. By comparison, the robust RL models tend to make more conservative inventory decisions to avoid large shortage penalties. Specifically, fine-tuned beta-pessimistic Q-learning can achieve good performance under normal conditions and maintain robustness against moderate environment inconsistencies, making it suitable for risk-averse decision-making.

引用

页数：17

共 50 条

[31] Pareto analysis of coordinating policies on a supply chain with a risk-averse retailer [J].

Tian, Yu ;

Huang, Dao .

Fifth Wuhan International Conference on E-Business, Vols 1-3: INTEGRATION AND INNOVATION THROUGH MEASUREMENT AND MANAGEMENT, 2006, :2262-2268

[32] A fair distribution of expected profit in a supply chain with a risk-averse manufacturer [J].

Sawik, Tadeusz .

INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2025,

[33] Construction of Risk-Averse Enhanced Index Funds [J].

Lejeune, Miguel A. ;

Samatli-Pac, Gulay .

INFORMS JOURNAL ON COMPUTING, 2013, 25 (04) :701-719

[34] A deferred payment strategy for risk-averse supply chain based on CVaR [J].

Qianqian C. .

International Journal of Simulation: Systems, Science and Technology, 2016, 17 (11) :17.1-17.7

[35] Trade Credit and Revenue Sharing of Supply Chain with a Risk-Averse Retailer [J].

Liu, Caiyun ;

Chen, Kebing ;

Li, Mingxia ;

Zhou, Haijie .

MATHEMATICAL PROBLEMS IN ENGINEERING, 2021, 2021

[36] Channel coordination of a risk-averse supply chain: a mean–variance approach [J].

Indranil Biswas ;

Arnab Adhikari ;

Baidyanath Biswas .

DECISION, 2020, 47 :415-429

[37] Exchange Rate Risk Sharing Contract with Risk-averse Firms [J].

Liu Yang ;

Ma Yong-kai ;

Fu Hong .

PROCEEDINGS OF THE 2013 INTERNATIONAL CONFERENCE ON ADVANCES IN SOCIAL SCIENCE, HUMANITIES, AND MANAGEMENT, 2013, 43 :500-504

[38] Optimal inventory decisions for a risk-averse retailer when offering layaway [J].

Wang, Daao ;

Dimitrov, Stanko ;

Jian, Lirong .

EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2020, 284 (01) :108-120

[39] Risk-Averse Control via CVaR Barrier Functions: Application to Bipedal Robot Locomotion [J].

Ahmadi, Mohamadreza ;

Xiong, Xiaobin ;

Ames, Aaron D. .

IEEE CONTROL SYSTEMS LETTERS, 2022, 6 :878-883

[40] Optimal position of supply chain delivery window with risk-averse suppliers: A CVaR optimization approach [J].

Tao, Liangyan ;

Liu, Sifeng ;

Xie, Naiming ;

Javed, Saad Ahmed .

INTERNATIONAL JOURNAL OF PRODUCTION ECONOMICS, 2021, 232

← 1 2 3 4 5 →