Bayesian Incentive-Compatible Bandit Exploration

被引:16
作者
Mansour, Yishay [1 ]
Slivkins, Aleksandrs [2 ]
Syrgkanis, Vasilis [3 ]
机构
[1] Tel Aviv Univ, Sch Comp Sci, IL-6997801 Tel Aviv, Israel
[2] Microsoft Res, New York, NY 10011 USA
[3] Microsoft Res, Cambridge, MA 02142 USA
关键词
mechanism design; multiarmed bandits; regret; Bayesian incentive-compatibility; CLINICAL-TRIAL DESIGN; MULTIARMED BANDIT; ALGORITHMS; SIGNATURE; REGRET; BOUNDS; ERA;
D O I
10.1287/opre.2019.1949
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
As self-interested individuals ("agents") make decisions over time, they utilize information revealed by other agents in the past and produce information that may help agents in the future. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in medical decisions. Each agent would like to exploit: select the best action given the current information, but would prefer the previous agents to explore: try out various alternatives to collect information. A social planner, by means of a carefully designed recommendation policy, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We model the planner's recommendation policy as a multiarm bandit algorithm under incentive-compatibility constraints induced by agents' Bayesian priors. We design a bandit algorithm which is incentive-compatible and has asymptotically optimal performance, as expressed by regret. Further, we provide a black-box reduction from an arbitrary multiarm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings that incorporate contexts and arbitrary partial feedback.
引用
收藏
页码:1132 / 1161
页数:30
相关论文
共 50 条
  • [31] An Incentive-Compatible Routing Protocol for Delay-Tolerant Networks Using Second-Price Sealed-Bid Auction Mechanism
    Amir Babazadeh Nanehkaran
    Mohammad Hossein Rezvani
    Wireless Personal Communications, 2021, 121 : 1547 - 1576
  • [32] Extremal incentive compatible transfers
    Kos, Nenad
    Messner, Matthias
    JOURNAL OF ECONOMIC THEORY, 2013, 148 (01) : 134 - 164
  • [33] Incentive Compatible Imbalance Settlement
    Haring, Tobias W.
    Kirschen, Daniel S.
    Andersson, Goeran
    IEEE TRANSACTIONS ON POWER SYSTEMS, 2015, 30 (06) : 3338 - 3346
  • [34] Incentive compatible regression learning
    Dekel, Ofer
    Fischer, Felix
    Procaccia, Ariel D.
    JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2010, 76 (08) : 759 - 777
  • [35] Incentive compatible pool-based electricity market design and implementation: A Bayesian mechanism design approach
    Zou, Peng
    Chen, Qixin
    Xia, Qing
    He, Chang
    Kang, Chongqing
    APPLIED ENERGY, 2015, 158 : 508 - 518
  • [36] Incentive Compatible Allocation Without Money
    Cavallo, Ruggiero
    ACM SIGECOM EXCHANGES, 2014, 13 (01) : 68 - 71
  • [37] Distributed Exploration Bandit For Dynamic Spectrum Access
    Xue, Yuan
    Zhou, Pan
    Li, Jun
    Jiang, Tao
    2015 INTERNATIONAL CONFERENCE ON WIRELESS COMMUNICATIONS & SIGNAL PROCESSING (WCSP), 2015,
  • [38] The ABC mechanism: an incentive compatible payoff mechanism for elicitation of outcome and probability transformations
    Li, Yi
    EXPERIMENTAL ECONOMICS, 2021, 24 (03) : 1019 - 1046
  • [39] Quantum Bandit With Amplitude Amplification Exploration in an Adversarial Environment
    Cho, Byungjin
    Xiao, Yu
    Hui, Pan
    Dong, Daoyi
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2024, 36 (01) : 311 - 317
  • [40] Logarithmic Weak Regret of Non-Bayesian Restless Multi-Armed Bandit
    Liu, Haoyang
    Liu, Keqin
    Zhao, Qing
    2011 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2011, : 1968 - 1971