Online Learning and Pricing for Multiple Products With Reference Price Effects

被引:0
作者
Ji, Sheng [1 ]
Yang, Yi [1 ]
机构
[1] Zhejiang Univ, Sch Management, Hangzhou, Peoples R China
关键词
multi-armed bandit; multiple products; online learning; pricing; reference price effect; revenue management; MANAGEMENT; STRATEGIES;
D O I
10.1002/nav.22240
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider the dynamic pricing problem of a monopolist seller who sells a set of mutually substitutable products over a finite time horizon. Customer demand is sensitive to the price of each individual product and the reference price which is formed from a comparison among the prices of all products. To maximize the total expected profit, the seller needs to determine the selling price of each product and also select a reference product (to be displayed) that affects the consumer's reference price. However, the seller initially knows neither the demand function nor the optimal reference product, but can learn them from past observations on the fly. As such, the seller faces the classical trade-off between exploration (learning the demand function and reference price) and exploitation (using what has been learned thus far to maximize revenue). We propose a rate-optimal dynamic learning-and-pricing algorithm that integrates iterative least squares estimation and bandit control techniques in a seamless fashion. We show that the cumulative regret, that is, the expected revenue loss caused by not using the optimal policy over T$$ T $$ periods, is upper bounded by & Otilde;(n2T)$$ \overset{\widetilde }{O}\left({n}<^>2\sqrt{T}\right) $$ where & Otilde;(<middle dot>)$$ \overset{\widetilde }{O}\left(\cdotp \right) $$ hides any logarithmic factors. We also establish the regret lower bound (for any learning policies) to be Omega(n2T)$$ \Omega \left({n}<^>2\sqrt{T}\right) $$. We then generalize our analysis to a more general demand model. Our algorithm performs consistently well numerically, outperforming an exploration-exploitation benchmark. The use of price experimentation and estimation techniques could be readily applied in real retail management.
引用
收藏
页码:677 / 693
页数:17
相关论文
共 55 条
[1]   Near-Optimal Regret Bounds for Thompson Sampling [J].
Agrawal, Shipra ;
Goyal, Navin .
JOURNAL OF THE ACM, 2017, 64 (05)
[2]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[3]   Personalized Dynamic Pricing with Machine Learning: High-Dimensional Features and Heterogeneous Elasticity [J].
Ban, Gah-Yi ;
Keskin, N. Bora .
MANAGEMENT SCIENCE, 2021, 67 (09) :5549-5568
[4]  
Berger J., 2013, Statistical Decision Theory and Bayesian Analysis
[5]   On the (Surprising) Sufficiency of Linear Models for Dynamic Pricing with Demand Learning [J].
Besbes, Omar ;
Zeevi, Assaf .
MANAGEMENT SCIENCE, 2015, 61 (04) :723-739
[6]   Blind Network Revenue Management [J].
Besbes, Omar ;
Zeevi, Assaf .
OPERATIONS RESEARCH, 2012, 60 (06) :1537-1550
[7]   On the Minimax Complexity of Pricing in a Changing Environment [J].
Besbes, Omar ;
Zeevi, Assaf .
OPERATIONS RESEARCH, 2011, 59 (01) :66-79
[8]   Dynamic Pricing Without Knowing the Demand Function: Risk Bounds and Near-Optimal Algorithms [J].
Besbes, Omar ;
Zeevi, Assaf .
OPERATIONS RESEARCH, 2009, 57 (06) :1407-1420
[9]   Dynamic pricing with stochastic reference effects based on a finite memory window [J].
Bi, Wenjie ;
Li, Guo ;
Liu, Mengqi .
INTERNATIONAL JOURNAL OF PRODUCTION RESEARCH, 2017, 55 (12) :3331-3348
[10]   CONTEXTUAL EFFECTS OF REFERENCE PRICES IN RETAIL ADVERTISEMENTS [J].
BISWAS, A ;
BLAIR, EA .
JOURNAL OF MARKETING, 1991, 55 (03) :1-12