Online Learning for Dual-Index Policies in Dual- Sourcing Systems

被引：4

作者：

Tang, Jingwen ^{[1
]}

Chen, Boxiao ^{[2
]}

Shi, Cong ^{[3
]}

机构：

[1] Univ Michigan, Ind & Operat Engn, Ann Arbor, MI 48109 USA

[2] Univ Illinois, Coll Business Adm, Chicago, IL 60607 USA

[3] Univ Miami, Miami Herbert Business Sch, Management Sci, Coral Gables, FL 33146 USA

来源：

M&SOM-MANUFACTURING & SERVICE OPERATIONS MANAGEMENT | 2024年 / 26卷 / 02期

关键词：

inventory; dual sourcing; dual-index policy; learning; bandits; sample average approximation; INVENTORY SYSTEMS; LOST SALES; ALGORITHMS; OPTIMALITY; BOUNDS;

D O I：

10.1287/msom.2022.0323

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

Problem definition: We consider a periodic-review dual-sourcing inventory system with a regular source (lower unit cost but longer lead time) and an expedited source (shorter lead time but higher unit cost) under carried-over supply and backlogged demand. Unlike existing literature, we assume that the firm does not have access to the demand distribution a priori and relies solely on past demand realizations. Even with complete information on the demand distribution, it is well known in the literature that the optimal inventory replenishment policy is complex and state dependent. Therefore, we focus our attention on a class of popular, easy-to-implement, and near-optimal heuristic policies called the dual-index policy. Methodology/results: The performance measure is the regret, defined as the cost difference of any feasible learning algorithm against the full information optimal dual-index policy. We develop a nonparametric online learning algorithm that admits a regret upper bound of O( ffiffiffiffiffiffiffiffiffiffiffiffiffiffififfi p T log T), which matches the regret lower bound for any feasible learning algorithms up to a logarithmic factor. Our algorithm integrates stochastic bandits and sample average approximation techniques in an innovative way. As part of our regret analysis, we explicitly prove that the underlying Markov chain is ergodic and converges to its steady state exponentially fast via coupling arguments, which could be of independent interest. Managerial implications: Our work provides practitioners with an easy-to-implement, robust, and provably good online decision support system for managing a dual-sourcing inventory system.

引用

页码：758 / 774

页数：18

共 38 条

[1] Learning in Structured MDPs with Convex Cost Functions: Improved Regret Bounds for Inventory Management [J].

Agrawal, Shipra ;

Jia, Randy .

OPERATIONS RESEARCH, 2022,

[2] Global Dual Sourcing: Tailored Base-Surge Allocation to Near- and Offshore Production [J].

Allon, Gad ;

Van Mieghem, Jan A. .

MANAGEMENT SCIENCE, 2010, 56 (01) :110-124

[3]

[Anonymous], 2016, Conference on Learning Theory, PMLR, P193

[4] Non-Stationary Stochastic Optimization [J].

Besbes, Omar ;

Gur, Yonatan ;

Zeevi, Assaf .

OPERATIONS RESEARCH, 2015, 63 (05) :1227-1244

[5]

Bulinskaya E.V., 1964, Theory of Probability Its Applications, V9, P389, DOI DOI 10.1137/1109056

[6]

Chen B, 2020, TAILORED BASE SURGE

[7] Optimal Policies for Dynamic Pricing and Inventory Control with Nonparametric Censored Demands [J].

Chen, Boxiao ;

Wang, Yining ;

Zhou, Yuan .

MANAGEMENT SCIENCE, 2024, 70 (05) :3362-3380

[8] Dynamic Pricing and Inventory Control with Fixed Ordering Cost and Incomplete Demand Information [J].

Chen, Boxiao ;

Simchi-Levi, David ;

Wang, Yining ;

Zhou, Yuan .

MANAGEMENT SCIENCE, 2022, 68 (08) :5684-5703

[9] Nonparametric Learning Algorithms for Joint Pricing and Inventory Control with Lost Sales and Censored Demand [J].

Chen, Boxiao ;

Chao, Xiuli ;

Shi, Cong .

MATHEMATICS OF OPERATIONS RESEARCH, 2021, 46 (02) :726-756

[10] Coordinating Pricing and Inventory Replenishment with Nonparametric Demand Learning [J].

Chen, Boxiao ;

Chao, Xiuli ;

Ahn, Hyun-Soo .

OPERATIONS RESEARCH, 2019, 67 (04) :1035-1052

← 1 2 3 4 →