An accurate method for mining top-k frequent pattern under differential privacy

被引:0
|
作者
Zhang, Xiaojian [1 ,2 ]
Wang, Miao [1 ]
Meng, Xiaofeng [1 ]
机构
[1] School of Information, Renmin University of China
[2] School of Computer and Information Engineering, Henan University of Economics and Law
来源
Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2014年 / 51卷 / 01期
关键词
Differential privacy; Exponential mechanism; Frequent pattern mining; Laplace mechanism; Top-k pattern;
D O I
10.7544/issn1000-1239.2014.20130685
中图分类号
学科分类号
摘要
Frequent pattern mining is a popular technique for analyzing transaction datasets. However, because these datasets contain sensitive information (e.g., user behavior records, electronic health records, etc), directly releasing discovered frequent patterns with true support counts will carry significant risk to privacy of individuals. In this paper, we propose an efficient algorithm, called DP-topkP, based on differential privacy model, to accurately find top-k frequent patterns. To avoid the individuals' privacy leakage, in this algorithm, exponential mechanism is used to sample top-k frequent patterns in a candidate set, and Laplace mechanism is utilized to generate the noisy data for perturbing the true support counts of the sampled patterns. However, the noisy support counts returned may be inconsistent with query semantic constraints (e.g., descending order, integer, etc), which will make the utility of the discovered top-k patterns poor. To boost the accuracy of the returned noisy support counts, we take consistency constraints into account to conduct the post-processing step. We theoretically prove that the proposed method is (λ, δ)-useful and differentially private. The experimental results demonstrate that this method can maintain better accuracy, utility and scalability.
引用
收藏
页码:104 / 114
页数:10
相关论文
共 16 条
  • [1] Agrawal R., Srikant R., Fast algorithms for mining association rules in large databases, Proc of the 20th Int Conf on Very Large Data Bases (VLDB'94), pp. 487-499, (1994)
  • [2] Sweeney L., k-anonymity: A model for protecting privacy, International Journal on Uncertainty, Fuzziness and Knowledge-based Systems, 10, 5, pp. 557-570, (2002)
  • [3] Atzori M., Bonchi F., Giannotti F., Et al., Anonymity preserving pattern discovery, The VLDB Journal, 17, 4, pp. 703-727, (2008)
  • [4] Ganta S.R., Kasiviswanathan S.P., Smith A., Composition attacks and auxiliary information in data privacy, Proc of the 14th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD08), pp. 265-273, (2008)
  • [5] Wong R.C.W., Fu A., Wang K., Et al., Can the utility of anonymized data be used for privacy breaches, ACM Trans on Knowledge Discovery from Data, 5, 3, pp. 16-39, (2011)
  • [6] Dwork C., Differential privacy, Proc of the 33th Colloquium on Automata, Languages and Programming (ICALP06), pp. 1-12, (2006)
  • [7] Dwork C., Differential privacy: A survey of results, Proc of the 5th Int Conf on Theory and Applications of Models of Computation (TAMC08), pp. 1-19, (2008)
  • [8] Dwork C., Lei J., Differential privacy and robust statistics, Proc of the 41st Annual ACM Symp on Theory of Computing (STOC09), pp. 371-380, (2009)
  • [9] Dwork C., The differential privacy frontier (extended abstract), Proc of the 6th Theory of Cryptography Conf (TCC09), pp. 496-502, (2009)
  • [10] Bhaskar R., Laxman S., Smith A., Et al., Discovering frequent patterns in sensitive data, Proc of the 16th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining (KDD10), pp. 503-512, (2010)