Learning to Order for Inventory Systems with Lost Sales and Uncertain Supplies

被引:2
作者
Chen, Boxiao [1 ]
Jiang, Jiashuo [2 ]
Zhang, Jiawei [3 ]
Zhou, Zhengyuan [3 ]
机构
[1] Univ Illinois, Coll Business Adm, Chicago, IL 60607 USA
[2] Hong Kong Univ Sci & Technol, Dept Ind Engn & Decis Analyt, Hong Kong, Peoples R China
[3] NYU, Stern Sch Business, New York, NY 10012 USA
基金
美国国家科学基金会;
关键词
lost sales; lead time; supply uncertainty; online learning; censored data; BASE-STOCK POLICIES; RANDOM YIELD; ASYMPTOTIC OPTIMALITY; DIVERSIFICATION; CAPACITY; DEMAND; MODELS; REPLENISHMENT; ALGORITHMS; DECISIONS;
D O I
10.1287/mnsc.2022.02476
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider a stochastic lost-sales inventory control system with lead time L over a planning horizon T. Supply is uncertain, and it is a function of the order quantity (because of random yield/capacity, etc.). We aim to minimize the T-period cost, a problem that is known to be computationally intractable even under known distributions of demand and supply. In this paper, we assume that both the demand and supply distributions are unknown and develop a computationally efficient online learning algorithm. We show that our algorithm achieves a regret (i.e., the performance gap between the cost of our algorithm and that of an optimal policy over T periods) of O(L+root T) when L >= Omega(log T). We do so by (1) showing that our algorithm's cost is higher by at most O(L+root T) for any L >= 0 compared with an optimal constant-order policy under complete information (a widely used algorithm) and (2) leveraging the latter's known performance guarantee from the existing literature. To the best of our knowledge, a finite sample O(root T) (and polynomial in L) regret bound when benchmarked against an optimal policy is not known before in the online inventory control literature. A key challenge in this learning problem is that both demand and supply data can be censored; hence, only truncated values are observable. We circumvent this challenge by showing that the data generated under an order quantity q(2) allow us to simulate the performance of not only q(2) but also, q(1) for all q(1 )< q(2), a key observation to obtain sufficient information even under data censoring. By establishing a high-probability coupling argument, we are able to evaluate and compare the performance of different order policies at their steady state within a finite time horizon. Because the problem lacks convexity, commonly used learning algorithms, such as stochastic gradient decent and bisection, cannot be applied, and instead, we develop an active elimination method that adaptively rules out suboptimal solutions.
引用
收藏
页码:8631 / 8646
页数:17
相关论文
共 57 条
[1]   Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management [J].
Agrawal, Shipra ;
Jia, Randy .
OPERATIONS RESEARCH, 2022,
[2]   Managing uncertainty through supply chain flexibility: reactive vs. proactive approaches [J].
Angkiriwang, Reina ;
Pujawan, I. Nyoman ;
Santosa, Budi .
PRODUCTION AND MANUFACTURING RESEARCH-AN OPEN ACCESS JOURNAL, 2014, 2 (01) :50-70
[3]   DIVERSIFICATION UNDER SUPPLY UNCERTAINTY [J].
ANUPINDI, R ;
AKELLA, R .
MANAGEMENT SCIENCE, 1993, 39 (08) :944-963
[4]  
Asmussen S., 1987, APPL PROBABILITY QUE
[5]   Competition and diversification effects in supply chains with supplier default risk [J].
Babich, Volodymyr ;
Burnetas, Apostolos N. ;
Ritchken, Peter H. .
M&SOM-MANUFACTURING & SERVICE OPERATIONS MANAGEMENT, 2007, 9 (02) :123-146
[6]  
Bai X, 2020, PREPRINT, DOI [10.2139/ssrn.3685551, DOI 10.2139/SSRN.3685551]
[7]   A Simple Heuristic for Joint Inventory and Pricing Models with Lead Time and Backorders [J].
Bernstein, Fernando ;
Li, Yang ;
Shang, Kevin .
MANAGEMENT SCIENCE, 2016, 62 (08) :2358-2373
[8]   Lost-sales inventory theory: A review [J].
Bijvank, Marco ;
Vis, Iris F. A. .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2011, 215 (01) :1-13
[9]   Myopic heuristics for the random yield problem [J].
Bollapragada, S ;
Morton, TE .
OPERATIONS RESEARCH, 1999, 47 (05) :713-722
[10]   Technical Note-Constant-Order Policies for Lost-Sales Inventory Models with Random Supply Functions: Asymptotics and Heuristic [J].
Bu, Jinzhi ;
Gong, Xiting ;
Yao, Dacheng .
OPERATIONS RESEARCH, 2020, 68 (04) :1063-1073