Index Policies and Performance Bounds for Dynamic Selection Problems

被引:35
作者
Brown, David B. [1 ]
Smith, James E. [2 ]
机构
[1] Duke Univ, Fuqua Sch Business, Durham, NC 27708 USA
[2] Dartmouth Coll, Tuck Sch Business, Hanover, NH 03755 USA
关键词
dynamic programming; restless bandits; Lagrangian relaxations; Gittins index; Whittle index; INFORMATION RELAXATIONS; OPTIMIZATION; ASSORTMENT; DUALITY;
D O I
10.1287/mnsc.2019.3342
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We consider dynamic selection problems, where a decision maker repeatedly selects a set of items from a larger collection of available items. A classic example is the dynamic assortment problem with demand learning, where a retailer chooses items to offer for sale subject to a display space constraint. The retailer may adjust the assortment over time in response to the observed demand. These dynamic selection problems are naturally formulated as stochastic dynamic programs (DPs) but are difficult to solve because the optimal selection decisions depend on the states of all items. In this paper, we study heuristic policies for dynamic selection problems and provide upper bounds on the performance of an optimal policy that can be used to assess the performance of a heuristic policy. The policies and bounds that we consider are based on a Lagrangian relaxation of the DP that relaxes the constraint limiting the number of items that may be selected. We characterize the performance of the Lagrangian index policy and bound and show that, under mild conditions, these policies and bounds are asymptotically optimal for problems with many items; mixed policies and tiebreaking play an essential role in the analysis of these index policies and can have a surprising impact on performance. We demonstrate these policies and bounds in two large scale examples: a dynamic assortment problem with demand learning and an applicant screening problem.
引用
收藏
页码:3029 / 3050
页数:22
相关论文
共 19 条
[1]   Relaxations of weakly coupled stochastic dynamic programs [J].
Adelman, Daniel ;
Mersereau, Adam J. .
OPERATIONS RESEARCH, 2008, 56 (03) :712-727
[2]  
[Anonymous], 2003, THESIS
[3]  
[Anonymous], 1988, Journal of applied probability
[4]  
[Anonymous], 2003, Convex analysis and optimization
[5]  
[Anonymous], 2011, Multi-armed bandit allocation indices
[6]   Restless bandits, linear programming relaxations, and a primal-dual index heuristic [J].
Bertsimas, D ;
Niño-Mora, J .
OPERATIONS RESEARCH, 2000, 48 (01) :80-90
[7]   A learning approach for interactive marketing to a customer segment [J].
Bertsimas, Dimitris ;
Mersereau, Adam J. .
OPERATIONS RESEARCH, 2007, 55 (06) :1120-1135
[8]   Decomposable Markov Decision Processes: A Fluid Optimization Approach [J].
Bertsimas, Dimitris ;
Misic, Velibor V. .
OPERATIONS RESEARCH, 2016, 64 (06) :1537-1555
[9]   Information Relaxations, Duality, and Convex Stochastic Dynamic Programs [J].
Brown, David B. ;
Smith, James E. .
OPERATIONS RESEARCH, 2014, 62 (06) :1394-1415
[10]   Information Relaxations and Duality in Stochastic Dynamic Programs [J].
Brown, David B. ;
Smith, James E. ;
Sun, Peng .
OPERATIONS RESEARCH, 2010, 58 (04) :785-801