A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits

被引：0

作者：

Wang, Hengzhi ^{[1
]}

Cui, Laizhong ^{[1
]}

Wang, En ^{[2
,3
]}

Liu, Jiangchuan ^{[4
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[2] Jilin Univ, Dept Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[4] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada

来源：

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING | 2025年 / 22卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Combinatorial multi-armed bandits; differential privacy; online learning; truthfulness;

D O I：

10.1109/TDSC.2024.3401836

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a fundamental tool for sequential decision-making, the Combinatorial Multi-Armed Bandits model (CMAB) has been extensively analyzed and applied in various online applications. However, the privacy concerns in budgeted CMAB are rarely investigated thus far. Few bandit algorithms have adequately addressed the privacy-preserving budgeted CMAB setting. Motivated by this, we study this setting using differential privacy as the formal measure of privacy. In this setting, playing an arm yields both a random reward and a random cost, and these values are kept private. In addition, multiple arms can be played in each round. The objective of the decision-maker is to minimize regret while subject to a budget constraint on the cumulative cost of all played arms. We demonstrate an exploration-exploitation-balanced bandit policy, which preserves the privacy of both rewards and costs under budgeted CMAB settings. This policy is proven differentially private and achieves an upper bound on regret. Furthermore, to provide incentives for the differentially private bandit policy so as to ensure that the reported costs are truthful, we introduce the concept of truthfulness and incorporate a payment mechanism that has been proven to be $\sigma$sigma-truthful. Numerical simulations based on multiple real-world datasets validate the theoretical findings and demonstrate the effectiveness of our policy compared to state-of-the-art policies.

引用

页码：424 / 439

页数：16

共 50 条

[31] Finding structure in multi-armed bandits
Schulz, Eric
Franklin, Nicholas T.
Gershman, Samuel J.
COGNITIVE PSYCHOLOGY, 2020, 119
[32] ON MULTI-ARMED BANDITS AND DEBT COLLECTION
Czekaj, Lukasz
Biegus, Tomasz
Kitlowski, Robert
Tomasik, Pawel
36TH ANNUAL EUROPEAN SIMULATION AND MODELLING CONFERENCE, ESM 2022, 2022, : 137 - 141
[33] Visualizations for interrogations of multi-armed bandits
Keaton, Timothy J.
Sabbaghi, Arman
STAT, 2019, 8 (01):
[34] Multi-armed bandits with dependent arms
Singh, Rahul
Liu, Fang
Sun, Yin
Shroff, Ness
MACHINE LEARNING, 2024, 113 (01) : 45 - 71
[35] On Kernelized Multi-Armed Bandits with Constraints
Zhou, Xingyu
Ji, Bo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
[36] Multi-Armed Bandits in Metric Spaces
Kleinberg, Robert
Slivkins, Aleksandrs
Upfal, Eli
STOC'08: PROCEEDINGS OF THE 2008 ACM INTERNATIONAL SYMPOSIUM ON THEORY OF COMPUTING, 2008, : 681 - +
[37] Multi-Armed Bandits With Costly Probes
Elumar, Eray Can
Tekin, Cem
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) : 618 - 643
[38] Multi-armed bandits with episode context
Christopher D. Rosin
Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230
[39] MULTI-ARMED BANDITS AND THE GITTINS INDEX
WHITTLE, P
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
[40] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348

← 1 2 3 4 5 →