Output Privacy in Data Mining

被引:13
作者
Wang, Ting [1 ]
Liu, Ling [1 ]
机构
[1] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
ACM TRANSACTIONS ON DATABASE SYSTEMS | 2011年 / 36卷 / 01期
基金
美国国家科学基金会;
关键词
Security; Algorithm; Experimentation; Output privacy; stream mining; data perturbation; ANONYMITY; SECURITY;
D O I
10.1145/1929934.1929935
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Privacy has been identified as a vital requirement in designing and implementing data mining systems. In general, privacy preservation demands protecting both input and output privacy: the former refers to sanitizing the raw data itself before performing mining; while the latter refers to preventing the mining output (models or patterns) from malicious inference attacks. This article presents a systematic study on the problem of protecting output privacy in data mining, and particularly, stream mining: (i) we highlight the importance of this problem by showing that even sufficient protection of input privacy does not guarantee that of output privacy; (ii) we present a general inferencing and disclosure model that exploits the intrawindow and interwindow privacy breaches in stream mining output; (iii) we propose a light-weighted countermeasure that effectively eliminates these breaches without explicitly detecting them, while minimizing the loss of output accuracy; (iv) we further optimize the basic scheme by taking account of two types of semantic constraints, aiming at maximally preserving utility-related semantics while maintaining hard privacy guarantee; (v) finally, we conduct extensive experimental evaluation over both synthetic and real data to validate the efficacy of our approach.
引用
收藏
页数:34
相关论文
共 37 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Agrawal D., 2001, PROC 20 ACM SIGMOD S, P247, DOI [10.1145/375551.375602, DOI 10.1145/375551.375602]
[3]  
Agrawal R, 2000, SIGMOD REC, V29, P439, DOI 10.1145/335191.335438
[4]  
Agrawal R., 1994, VLDB 1994, P487
[5]  
[Anonymous], 2002, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI DOI 10.1145/775047.775080
[6]  
[Anonymous], 2001, ACM Transactions on Computational Logic, DOI DOI 10.1145/377978.377983
[7]   Anonymity preserving pattern discovery [J].
Atzori, Maurizio ;
Bonchi, Francesco ;
Giannotti, Fosca ;
Pedreschi, Dino .
VLDB JOURNAL, 2008, 17 (04) :703-727
[8]  
Babcock B., 2002, PODS, P1, DOI [DOI 10.1145/543613.543615, 10.1145/543613.543615]
[9]  
Bu SF, 2007, PROC INT CONF DATA, P671
[10]  
Calders T., 2002, Principles of Data Mining and Knowledge Discovery. 6th European Conference, PKDD 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2431), P74