Online Network Revenue Management Using Thompson Sampling

被引：123

作者：

Ferreira, Kris Johnson ^{[1
]}

Simchi-Levi, David ^{[2
,3
]}

Wang, He ^{[4
]}

机构：

[1] Harvard Sch Business, Boston, MA 02163 USA

[2] MIT, Dept Civil & Environm Engn, Inst Data Syst & Soc, 77 Massachusetts Ave, Cambridge, MA 02139 USA

[3] MIT, Ctr Operat Res, Cambridge, MA 02139 USA

[4] Georgia Inst Technol, H Milton Stewart Sch Ind & Syst Engn, Atlanta, GA 30332 USA

来源：

OPERATIONS RESEARCH | 2018年 / 66卷 / 06期

关键词：

revenue management; dynamic pricing; demand learning; multiarmed bandit; Thompson sampling; machine learning; ASYMPTOTIC-BEHAVIOR; DEMAND; ALGORITHM; POLICIES;

D O I：

10.1287/opre.2018.1755

中图分类号：

C93 [管理学];

学科分类号：

12 ; 1201 ; 1202 ; 120202 ;

摘要：

We consider a price-based network revenue management problem in which a retailer aims to maximize revenue from multiple products with limited inventory over a finite selling season. As is common in practice, we assume the demand function contains unknown parameters that must be learned from sales data. In the presence of these unknown demand parameters, the retailer faces a trade-off commonly referred to as the "exploration-exploitation trade-off." Toward the beginning of the selling season, the retailer may offer several different prices to try to learn demand at each price ("exploration" objective). Over time, the retailer can use this knowledge to set a price that maximizes revenue throughout the remainder of the selling season ("exploitation" objective). We propose a class of dynamic pricing algorithms that builds on the simple, yet powerful, machine learning technique known as "Thompson sampling" to address the challenge of balancing the exploration-exploitation trade-off under the presence of inventory constraints. Our algorithms have both strong theoretical performance guarantees and promising numerical performance results when compared with other algorithms developed for similar settings. Moreover, we show how our algorithms can be extended for use in general multiarmed bandit problems with resource constraints as well as in applications in other revenue management settings and beyond.

引用

页码：1586 / 1602

页数：17

共 34 条

[1] [Anonymous], 2013, Artificial intelligence and statistics
[2] [Anonymous], 2005, THEORY PRACTICE REVE, DOI DOI 10.1007/B139000
[3] [Anonymous], 2015, Surveys in Operations Research and Management Science, DOI DOI 10.1016/J.SORMS.2015.03.001
[4] [Anonymous], 2016, 29 C LEARN THEOR COL
[5] [Anonymous], 2009, P C LEARN THEOR COLT
[6] Dynamic Pricing for Nonperishable Products with Demand Learning
Araman, Victor F.
Caldentey, Rene
[J]. OPERATIONS RESEARCH, 2009, 57 (05) : 1169 - 1188
[7] Ashwinkumar B., 2014, PMLR, P1109
[8] Finite-time analysis of the multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Fischer, P
[J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
[9] Aviv Y., 2012, The Oxford Handbook of Pricing Management, P522
[10] Bandits with Knapsacks (Extended Abstract)
Badanidiyuru, Ashwinkumar
Kleinberg, Robert
Slivkins, Aleksandrs
[J]. 2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, : 207 - 216

← 1 2 3 4 →