Differentially Private Algorithm for Graphical Bandits

被引:0
|
作者
Lu S.-Y. [1 ]
Wang G.-H. [1 ]
Qiu Z.-H. [1 ]
Zhang L.-J. [1 ]
机构
[1] State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
来源
Ruan Jian Xue Bao/Journal of Software | 2022年 / 33卷 / 09期
关键词
differential privacy; graphical bandits; independent set; Laplace noise; sequential decision making under uncertainty;
D O I
10.13328/j.cnki.jos.006386
中图分类号
学科分类号
摘要
Graphical bandit is an important model for sequential decision making under uncertainty and has been applied in various real-world scenarios such as social network, electronic commerce, and recommendation system. Existing work on graphical bandits only investigates how to identify the best arm rapidly so as to minimize the cumulative regret while ignoring the privacy protection issue arising in many real-world applications. To overcome this deficiency, a differentially private algorithm is proposed, termed as graph-based arm elimination with differential privacy (GAP), for graphical bandits. On the one hand, GAP updates the arm selection strategy based on empirical mean rewards of arms in an epoch manner. The empirical mean rewards are perturbed by Laplace noise, which makes it hard for malicious attackers to infer rewards of arms from the output of the algorithm, and thus protects the privacy. On the other hand, in each epoch, GAP carefully constructs an independent set of the feedback graph and only explores arms in the independent set, which effectively utilize the information in the graph feedback. It is proved that GAP is differential private and its regret bound matches the theoretical lower bound. Experimental results on synthetic datasets demonstrate that GAP can effectively protect the privacy and achieve cumulative regret comparable to that of existing non-private graphical bandits algorithm. © 2022 Chinese Academy of Sciences. All rights reserved.
引用
收藏
相关论文
共 37 条
  • [1] Thompson WR., On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, 25, 3–4, pp. 285-294, (1933)
  • [2] Wang LM, Huang HK, Chai YM., Choosing multi-issue negotiating object based on trust and K-armed bandit problem, Ruan Jian Xue Bao/Journal of Software, 17, 12, pp. 2537-2546, (2006)
  • [3] Li LH, Chu W, Langford J, Schapire RE., A contextual-bandit approach to personalized news article recommendation, Proc. of the 19th Int’l Conf. on World Wide Web, pp. 661-670, (2010)
  • [4] Bnaya Z, Puzis R, Stern R, Felner A., Bandit algorithms for social network queries, Proc. of the 2013 Int’l Conf. on Social Computing, pp. 148-153, (2013)
  • [5] Chen W, Wang YJ, Yuan Y., Combinatorial multi-armed bandit: General framework, results and applications, Proc. of the 30th Int’l Conf. on Machine Learning, pp. 151-159, (2013)
  • [6] Lai TL, Robbins H., Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, 6, 1, pp. 4-22, (1985)
  • [7] Auer P, Cesa-Bianchi N, Fischer P., Finite-time analysis of the multiarmed bandit problem, Machine Learning, 47, 2–3, pp. 235-256, (2002)
  • [8] Mannor S, Shamir O., From bandits to experts: On the value of side-observations, Proc. of the 24th Int’l Conf. on Neural Information Processing Systems, pp. 684-692, (2011)
  • [9] Alon N, Cesa-Bianchi N, Gentile C, Mannor S, Mansour Y, Shamir O., Nonstochastic multi-armed bandits with graph-structured feedback, SIAM Journal on Computing, 46, 6, pp. 1785-1826, (2017)
  • [10] Caron S, Kveton B, Lelarge M, Bhagat S., Leveraging side observations in stochastic bandits, Proc. of the 28th Conf. on Uncertainty in Artificial Intelligence, pp. 142-151, (2012)