Effective Reconstruction of Data Perturbed by Random Projections

被引:17
作者
Sang, Yingpeng [1 ]
Shen, Hong [1 ,2 ]
Tian, Hui [3 ]
机构
[1] Beijing Jiaotong Univ, Sch Comp & Informat Technol, Beijing, Peoples R China
[2] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia
[3] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing, Peoples R China
基金
澳大利亚研究理事会; 美国国家科学基金会;
关键词
Privacy-preserving data mining; data perturbation; data reconstruction; underdetermined independent component analysis; Maximum A Posteriori; principal component analysis; BLIND SOURCE SEPARATION; SPARSE; DECOMPOSITION;
D O I
10.1109/TC.2011.83
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Random Projection (RP) has raised great concern among the research community of privacy-preserving data mining, due to its high efficiency and utility, e. g., keeping the euclidean distances among the data points. It was shown in [ 33] that, if the original data set composed of m attributes is multiplied by a mixing matrix of k x m(m > k) which is random and orthogonal on expectation, then the k series of perturbed data can be released for mining purposes. Given the data perturbed by RP and some necessary prior knowledge, to our knowledge, little work has been done in reconstructing the original data to recover some sensitive information. In this paper, we choose several typical scenarios in data mining with different assumptions on prior knowledge. For the cases that an attacker has full or zero knowledge of the mixing matrix R, respectively, we propose reconstruction methods based on Underdetermined Independent Component Analysis (UICA) if the attributes of the original data are mutually independent and sparse, and propose reconstruction methods based on Maximum A Posteriori (MAP) if the attributes of the original data are correlated and nonsparse. Simulation results show that our reconstructions achieve high recovery rates, and outperform the reconstructions based on Principal Component Analysis (PCA). Successful reconstructions essentially mean the leakage of privacy, so our work identify the possible risks of RP when it is used for data perturbations.
引用
收藏
页码:101 / 117
页数:17
相关论文
共 53 条
[1]  
ADAM NR, 1989, COMPUT SURV, V21, P515, DOI 10.1145/76894.76895
[2]  
Aggarwal CC, 2008, ADV DATABASE SYST, V34, P1, DOI 10.1007/978-0-387-70992-5
[3]  
Agrawal D., 2001, PROC 20 ACM SIGMOD S, P247, DOI [10.1145/375551.375602, DOI 10.1145/375551.375602]
[4]  
Agrawal R., 2000, Privacy-preserving data mining, P439, DOI DOI 10.1145/342009.335438
[5]  
Agrawal S, 2005, PROC INT CONF DATA, P193
[6]  
[Anonymous], P 22 C UNC ART INT J
[7]  
[Anonymous], P 17 USENIX SEC S US
[8]  
[Anonymous], 1999, KDEX WORKSH, DOI [10.1109/KDEX.1999.836532, DOI 10.1109/KDEX.1999.836532]
[9]   Underdetermined blind source separation using sparse representations [J].
Bofill, P ;
Zibulevsky, M .
SIGNAL PROCESSING, 2001, 81 (11) :2353-2362
[10]   General approach to blind source separation [J].
Cao, XR ;
Liu, RW .
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 1996, 44 (03) :562-571