Explore/Exploit Schemes for Web Content Optimization

被引：28

作者：

Agarwal, Deepak

Chen, Bee-Chung

Elango, Pradheep

机构：

来源：

2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING | 2009年

关键词：

D O I：

10.1109/ICDM.2009.52

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose novel multi-armed bandit (explore/exploit) schemes to maximize total clicks on a content module published regularly on Yahoo! Intuitively, one can "explore" each candidate item by displaying it to a small fraction of user visits to estimate the item's click-through rate (CTR), and then "exploit" high CTR items in order to maximize clicks. While bandit methods that seek to find the optimal trade-off between explore and exploit have been studied for decades, existing solutions are not satisfactory for web content publishing applications where dynamic set of items with short lifetimes, delayed feedback and non-stationary reward (CTR) distributions are typical. In this paper, we develop a Bayesian solution and extend several existing schemes to our setting. Through extensive evaluation with nine bandit schemes, we show that our Bayesian solution is uniformly better in several scenarios. We also study the empirical characteristics of our schemes and provide useful insights on the strengths and weaknesses of each. Finally, we validate our results with a "side-by-side" comparison of schemes through live experiments conducted on a random sample of real user visits to Yahoo!

引用

页码：1 / 10

页数：10

共 23 条

[1] [Anonymous], STRUCT COMPL THEOR C
[2] [Anonymous], ECML
[3] [Anonymous], 2004, OPTIMAL STAT DECISIO
[4] [Anonymous], 2002, MACHINE LEARNING
[5] [Anonymous], OPERATIONS RES
[6] [Anonymous], 2008, COLT
[7] [Anonymous], ANN APPL PROBABILIT
[8] [Anonymous], J APPL PROBABILITY
[9] [Anonymous], 1985, Bandit Problems: Sequential Allocation of Experiments
[10] [Anonymous], 1985, ADV APPL MATH

← 1 2 3 →