A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits

被引:0
|
作者
Wang, Hengzhi [1 ]
Cui, Laizhong [1 ]
Wang, En [2 ,3 ]
Liu, Jiangchuan [4 ]
机构
[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
[2] Jilin Univ, Dept Comp Sci & Technol, Changchun 130012, Peoples R China
[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China
[4] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada
基金
中国国家自然科学基金;
关键词
Combinatorial multi-armed bandits; differential privacy; online learning; truthfulness;
D O I
10.1109/TDSC.2024.3401836
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
As a fundamental tool for sequential decision-making, the Combinatorial Multi-Armed Bandits model (CMAB) has been extensively analyzed and applied in various online applications. However, the privacy concerns in budgeted CMAB are rarely investigated thus far. Few bandit algorithms have adequately addressed the privacy-preserving budgeted CMAB setting. Motivated by this, we study this setting using differential privacy as the formal measure of privacy. In this setting, playing an arm yields both a random reward and a random cost, and these values are kept private. In addition, multiple arms can be played in each round. The objective of the decision-maker is to minimize regret while subject to a budget constraint on the cumulative cost of all played arms. We demonstrate an exploration-exploitation-balanced bandit policy, which preserves the privacy of both rewards and costs under budgeted CMAB settings. This policy is proven differentially private and achieves an upper bound on regret. Furthermore, to provide incentives for the differentially private bandit policy so as to ensure that the reported costs are truthful, we introduce the concept of truthfulness and incorporate a payment mechanism that has been proven to be $\sigma$sigma-truthful. Numerical simulations based on multiple real-world datasets validate the theoretical findings and demonstrate the effectiveness of our policy compared to state-of-the-art policies.
引用
收藏
页码:424 / 439
页数:16
相关论文
共 50 条
  • [31] Finding structure in multi-armed bandits
    Schulz, Eric
    Franklin, Nicholas T.
    Gershman, Samuel J.
    COGNITIVE PSYCHOLOGY, 2020, 119
  • [32] ON MULTI-ARMED BANDITS AND DEBT COLLECTION
    Czekaj, Lukasz
    Biegus, Tomasz
    Kitlowski, Robert
    Tomasik, Pawel
    36TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE, ESM 2022, 2022, : 137 - 141
  • [33] Visualizations for interrogations of multi-armed bandits
    Keaton, Timothy J.
    Sabbaghi, Arman
    STAT, 2019, 8 (01):
  • [34] Multi-armed bandits with dependent arms
    Singh, Rahul
    Liu, Fang
    Sun, Yin
    Shroff, Ness
    MACHINE LEARNING, 2024, 113 (01) : 45 - 71
  • [35] On Kernelized Multi-Armed Bandits with Constraints
    Zhou, Xingyu
    Ji, Bo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [36] Multi-Armed Bandits in Metric Spaces
    Kleinberg, Robert
    Slivkins, Aleksandrs
    Upfal, Eli
    STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 681 - +
  • [37] Multi-Armed Bandits With Costly Probes
    Elumar, Eray Can
    Tekin, Cem
    Yagan, Osman
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) : 618 - 643
  • [38] Multi-armed bandits with episode context
    Christopher D. Rosin
    Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230
  • [39] MULTI-ARMED BANDITS AND THE GITTINS INDEX
    WHITTLE, P
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
  • [40] Multi-armed bandits with switching penalties
    Asawa, M
    Teneketzis, D
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348