Privacy-Preserving Outlier Detection Through Random Nonlinear Data Distortion

被引:20
作者
Bhaduri, Kanishka [1 ]
Stefanski, Mark D. [2 ]
Srivastava, Ashok N.
机构
[1] NASA, Miss Crit Technol Inc, Ames Res Ctr, Moffett Field, CA 94035 USA
[2] Stanford Univ, Dept Elect Engn, Stanford, CA 94305 USA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS | 2011年 / 41卷 / 01期
基金
美国国家航空航天局;
关键词
Data mining; non-linear; perturbation; privacy-preserving; ALGORITHMS;
D O I
10.1109/TSMCB.2010.2051540
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Consider a scenario in which the data owner has some private or sensitive data and wants a data miner to access them for studying important patterns without revealing the sensitive information. Privacy-preserving data mining aims to solve this problem by randomly transforming the data prior to their release to the data miners. Previous works only considered the case of linear data perturbations-additive, multiplicative, or a combination of both-for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy-preserving anomaly detection from sensitive data sets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that, for specific cases, it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. The experiments conducted on real-life data sets demonstrate the effectiveness of the approach.
引用
收藏
页码:260 / 272
页数:13
相关论文
共 33 条
[1]  
Agrawal R., 2000, Privacy-preserving data mining, P439, DOI DOI 10.1145/342009.335438
[2]  
[Anonymous], 2002, Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, DOI DOI 10.1145/775047.775080
[3]  
[Anonymous], NUMERICAL RECIPES C
[4]  
Barnett V., 1994, Outliers in statistical data
[5]  
Bay S.D., 2003, KDD, P29, DOI DOI 10.1145/956750.956758
[6]  
BELLIDO I, 1993, P INT C ART NEUR NET, P772
[7]  
CHEN K, 2008, P SDM, P78
[8]  
Dwork C, 2006, LECT NOTES COMPUT SC, V4052, P1
[9]  
Evfimievski A, 2003, P 22 ACM SIGMOD SIGA, P211, DOI DOI 10.1145/773153.773174
[10]  
Frederick D., 2007, TM2007215026 NASA