Efficient Algorithms for Stochastic Repeated Second-price Auctions

被引：0

作者：

Achddou, Juliette ^{[1
]}

Cappe, Olivier ^{[2
]}

Garivier, Aurelien ^{[3
]}

机构：

[1] Univ PSL, ENS Paris, 1000mercis Grp, INRIA, Paris, France

[2] Univ PSL, ENS Paris, CNRS, INRIA, Paris, France

[3] ENS Lyon, INRIA, CNRS, UMPA, Lyon, France

来源：

ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷

关键词：

bandits; online learning; auctions;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Developing efficient sequential bidding strategies for repeated auctions is an important practical challenge in various marketing tasks. In this setting, the bidding agent obtains information, on both the value of the item at sale and the behavior of the other bidders, only when she wins the auction. Standard bandit theory does not apply to this problem due to the presence of action-dependent censoring. In this work, we consider second-price auctions and propose novel, efficient UCB-like algorithms for this task. These algorithms are analyzed in the stochastic setting, assuming regularity of the distribution of the opponents' bids. We provide regret upper bounds that quantify the improvement over the baseline algorithm proposed in the literature. The improvement is particularly significant in cases when the value of the auctioned item is low, yielding a spectacular reduction in the order of the worst-case regret. We further provide the first parametric lower bound for this problem that applies to generic UCB-like strategies. As an alternative, we propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm. We provide a critical analysis of this class of strategies, showing both important advantages and limitations. In particular, we provide a minimax lower bound and propose a nearly minimax-optimal instance of this class.

引用

页数：52

共 28 条

[1] Amin K., 2013, ADV NEURAL INFORM PR, P1169
[2] [Anonymous], 2011, P 24 ANN C LEARN THE
[3] [Anonymous], 2011, NIPS
[4] Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Audibert, Jean-Yves
Munos, Remi
Szepesvari, Csaba
[J]. THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1876 - 1902
[5] Finite-time analysis of the multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Fischer, P
[J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
[6] Bandits with Knapsacks (Extended Abstract)
Badanidiyuru, Ashwinkumar
Kleinberg, Robert
Slivkins, Aleksandrs
[J]. 2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, : 207 - 216
[7] Online learning in online auctions
Blum, A
Kumar, V
Rudra, A
Wu, F
[J]. THEORETICAL COMPUTER SCIENCE, 2004, 324 (2-3) : 137 - 146
[8] KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION
Cappe, Olivier
Garivier, Aurelien
Maillard, Odalric-Ambrym
Munos, Remi
Stoltz, Gilles
[J]. ANNALS OF STATISTICS, 2013, 41 (03) : 1516 - 1541
[9] Regret Minimization for Reserve Prices in Second-Price Auctions
Cesa-Bianchi, Nicolo
Gentile, Claudio
Mansour, Yishay
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (01) : 549 - 564
[10] Combes R, 2014, PR MACH LEARN RES, V32

← 1 2 3 →