Achieving Probabilistic Anonymity in a Linear and Hybrid Randomization Model

被引:8
作者
Sang, Yingpeng [1 ]
Shen, Hong [1 ,2 ]
Tian, Hui [3 ]
Zhang, Zonghua [4 ]
机构
[1] Sun Yat Sen Univ, Sch Data & Comp Sci, Guangzhou 510006, Guangdong, Peoples R China
[2] Univ Adelaide, Sch Comp Sci, Adelaide, SA 5005, Australia
[3] Beijing Jiaotong Univ, Sch Elect & Informat Engn, Beijing 100044, Peoples R China
[4] Inst Mines Telecom, SAMOVAR Lab, F-59650 Lille, France
基金
中国国家自然科学基金; 澳大利亚研究理事会;
关键词
Randomization; k-anonymity; privacy protection; data mining; PRIVACY; ANONYMIZATION;
D O I
10.1109/TIFS.2016.2562605
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
The randomization methods that are applied for privacy-preserving data mining are commonly subject to reconstruction, linkage, and semantic-related attacks. Some existing works employed random noise addition to realize probabilistic anonymity, aiming only at linkage attacks. Random noise addition is vulnerable to reconstruction attacks, and is unable to achieve semantic closeness, particularly on high-dimensional data, to prevent semantic-related attacks. For linkage attacks, the main security vulnerability of their proposed probabilistic anonymity lies in the assumption that the attacker had a priori knowledge of the quasi-identifiers of all individuals. When only some individuals leak their quasi-identifiers, the proposed model will become incapable, because the attacker can deploy a different linkage attack that has not been studied before. This type of attack is much easier to deploy and is thus very harmful. In this paper, we propose new frameworks of probabilistic (1, k)- and (k, k)-anonymity to defend against all these linkage attacks, and realize the frameworks on a hybrid randomization model. The model is also secure against reconstruction attacks. We further achieve statistical semantic closeness of high-dimensional data to prevent semantic-related attacks on the model. The frameworks also allow us to re-design the traditional K-nearest neighbor algorithm to leverage the introduced data uncertainty and improve the mining results. This paper demonstrates the promising applications in large-scale and high-dimensional data mining in clouds, by providing high efficiency and security to protect data privacy, guaranteeing high data utility for mining purposes, on-time processing, and non-interactive data publishing.
引用
收藏
页码:2187 / 2202
页数:16
相关论文
共 34 条
[1]   On unifying privacy and uncertain data models [J].
Aggarwal, Charu C. .
2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, :386-395
[2]  
[Anonymous], P IEEE INT C FUZZ SY
[3]  
[Anonymous], 2005, Data Mining: Concepts and Techniques
[4]  
[Anonymous], P 22 C UNC ART INT J
[5]  
[Anonymous], 2005, P 2005 ACM SIGMOD IN
[6]  
[Anonymous], PRIVACY PRESERVING D
[7]   SABRE: a Sensitive Attribute Bucketization and REdistribution framework for t-closeness [J].
Cao, Jianneng ;
Karras, Panagiotis ;
Kalnis, Panos ;
Tan, Kian-Lee .
VLDB JOURNAL, 2011, 20 (01) :59-81
[8]   Privacy-Preserving Query over Encrypted Graph-Structured Data in Cloud Computing [J].
Cao, Ning ;
Yang, Zhenyu ;
Wang, Cong ;
Ren, Kui ;
Lou, Wenjing .
31ST INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2011), 2011, :393-402
[9]  
Clifton C, 2013, I C DATA ENGIN WORKS, P88, DOI 10.1109/ICDEW.2013.6547433
[10]  
Cormode G., 2011, P 17 ACM SIGKDD INT, P1253