Efficient Algorithms for Stochastic Repeated Second-price Auctions

被引:0
作者
Achddou, Juliette [1 ]
Cappe, Olivier [2 ]
Garivier, Aurelien [3 ]
机构
[1] Univ PSL, ENS Paris, 1000mercis Grp, INRIA, Paris, France
[2] Univ PSL, ENS Paris, CNRS, INRIA, Paris, France
[3] ENS Lyon, INRIA, CNRS, UMPA, Lyon, France
来源
ALGORITHMIC LEARNING THEORY, VOL 132 | 2021年 / 132卷
关键词
bandits; online learning; auctions;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Developing efficient sequential bidding strategies for repeated auctions is an important practical challenge in various marketing tasks. In this setting, the bidding agent obtains information, on both the value of the item at sale and the behavior of the other bidders, only when she wins the auction. Standard bandit theory does not apply to this problem due to the presence of action-dependent censoring. In this work, we consider second-price auctions and propose novel, efficient UCB-like algorithms for this task. These algorithms are analyzed in the stochastic setting, assuming regularity of the distribution of the opponents' bids. We provide regret upper bounds that quantify the improvement over the baseline algorithm proposed in the literature. The improvement is particularly significant in cases when the value of the auctioned item is low, yielding a spectacular reduction in the order of the worst-case regret. We further provide the first parametric lower bound for this problem that applies to generic UCB-like strategies. As an alternative, we propose more explainable strategies which are reminiscent of the Explore Then Commit bandit algorithm. We provide a critical analysis of this class of strategies, showing both important advantages and limitations. In particular, we provide a minimax lower bound and propose a nearly minimax-optimal instance of this class.
引用
收藏
页数:52
相关论文
共 28 条
  • [1] Amin K., 2013, ADV NEURAL INFORM PR, P1169
  • [2] [Anonymous], 2011, P 24 ANN C LEARN THE
  • [3] [Anonymous], 2011, NIPS
  • [4] Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    Audibert, Jean-Yves
    Munos, Remi
    Szepesvari, Csaba
    [J]. THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1876 - 1902
  • [5] Finite-time analysis of the multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Fischer, P
    [J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
  • [6] Bandits with Knapsacks (Extended Abstract)
    Badanidiyuru, Ashwinkumar
    Kleinberg, Robert
    Slivkins, Aleksandrs
    [J]. 2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, : 207 - 216
  • [7] Online learning in online auctions
    Blum, A
    Kumar, V
    Rudra, A
    Wu, F
    [J]. THEORETICAL COMPUTER SCIENCE, 2004, 324 (2-3) : 137 - 146
  • [8] KULLBACK-LEIBLER UPPER CONFIDENCE BOUNDS FOR OPTIMAL SEQUENTIAL ALLOCATION
    Cappe, Olivier
    Garivier, Aurelien
    Maillard, Odalric-Ambrym
    Munos, Remi
    Stoltz, Gilles
    [J]. ANNALS OF STATISTICS, 2013, 41 (03) : 1516 - 1541
  • [9] Regret Minimization for Reserve Prices in Second-Price Auctions
    Cesa-Bianchi, Nicolo
    Gentile, Claudio
    Mansour, Yishay
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2015, 61 (01) : 549 - 564
  • [10] Combes R, 2014, PR MACH LEARN RES, V32