A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits

被引：0

作者：

Wang, Hengzhi ^{[1
]}

Cui, Laizhong ^{[1
]}

Wang, En ^{[2
,3
]}

Liu, Jiangchuan ^{[4
]}

机构：

[1] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China

[2] Jilin Univ, Dept Comp Sci & Technol, Changchun 130012, Peoples R China

[3] Jilin Univ, Key Lab Symbol Computat & Knowledge Engn, Minist Educ, Changchun 130012, Peoples R China

[4] Simon Fraser Univ, Sch Comp Sci, Burnaby, BC V5A 1S6, Canada

来源：

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING | 2025年 / 22卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Combinatorial multi-armed bandits; differential privacy; online learning; truthfulness;

D O I：

10.1109/TDSC.2024.3401836

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

As a fundamental tool for sequential decision-making, the Combinatorial Multi-Armed Bandits model (CMAB) has been extensively analyzed and applied in various online applications. However, the privacy concerns in budgeted CMAB are rarely investigated thus far. Few bandit algorithms have adequately addressed the privacy-preserving budgeted CMAB setting. Motivated by this, we study this setting using differential privacy as the formal measure of privacy. In this setting, playing an arm yields both a random reward and a random cost, and these values are kept private. In addition, multiple arms can be played in each round. The objective of the decision-maker is to minimize regret while subject to a budget constraint on the cumulative cost of all played arms. We demonstrate an exploration-exploitation-balanced bandit policy, which preserves the privacy of both rewards and costs under budgeted CMAB settings. This policy is proven differentially private and achieves an upper bound on regret. Furthermore, to provide incentives for the differentially private bandit policy so as to ensure that the reported costs are truthful, we introduce the concept of truthfulness and incorporate a payment mechanism that has been proven to be $\sigma$sigma-truthful. Numerical simulations based on multiple real-world datasets validate the theoretical findings and demonstrate the effectiveness of our policy compared to state-of-the-art policies.

引用

页码：424 / 439

页数：16

共 50 条

[41] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[42] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[43] Multi-Armed Bandits with Cost Subsidy
Sinha, Deeksha
Sankararama, Karthik Abinav
Kazerouni, Abbas
Avadhanula, Vashist
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[44] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
[45] Batched Multi-armed Bandits Problem
Gao, Zijun
Han, Yanjun
Ren, Zhimei
Zhou, Zhengqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[46] Are Multi-Armed Bandits Susceptible to Peeking?
Loecher, Markus
ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
[47] Secure Outsourcing of Multi-Armed Bandits
Ciucanu, Radu
Lafourcade, Pascal
Lombard-Platet, Marius
Soare, Marta
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209
[48] Decentralized Exploration in Multi-Armed Bandits
Feraud, Raphael
Alami, Reda
Laroche, Romain
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[49] Multi-armed bandits with episode context
Rosin, Christopher D.
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230
[50] Introduction to Multi-Armed Bandits Preface
Slivkins, Aleksandrs
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286

← 1 2 3 4 5 →