K-Means Clustering with Local Distance Privacy

被引:14
作者
Yang, Mengmeng [1 ]
Huang, Longxia [2 ]
Tang, Chenghua [3 ]
机构
[1] Commonwealth Sci & Ind Res Org, Data61, Melbourne 3168, Australia
[2] Jiangsu Univ, Sch Comp Sci & Commun Engn, Zhenjiang 212013, Jiangsu, Peoples R China
[3] Guilin Univ Elect Technol, Sch Comp Sci & Informat Secur, Guilin 541010, Peoples R China
关键词
K-means clustering; local differential privacy; data analysis;
D O I
10.26599/BDMA.2022.9020050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the development of information technology, a mass of data are generated every day. Collecting and analysing these data help service providers improve their services and gain an advantage in the fierce market competition. K-means clustering has been widely used for cluster analysis in real life. However, these analyses are based on users' data, which disclose users' privacy. Local differential privacy has attracted lots of attention recently due to its strong privacy guarantee and has been applied for clustering analysis. However, existing K-means clustering methods with local differential privacy protection cannot get an ideal clustering result due to the large amount of noise introduced to the whole dataset to ensure the privacy guarantee. To solve this problem, we propose a novel method that provides local distance privacy for users who participate in the clustering analysis. Instead of making the users' records in-distinguish from each other in high-dimensional space, we map the user's record into a one-dimensional distance space and make the records in such a distance space not be distinguished from each other. To be specific, we generate a noisy distance first and then synthesize the high-dimensional data record. We propose a Bounded Laplace Method (BLM) and a Cluster Indistinguishable Method (CIM) to sample such a noisy distance, which satisfies the local differential privacy guarantee and local d(E)-privacy guarantee, respectively. Furthermore, we introduce a way to generate synthetic data records in high-dimensional space. Our experimental evaluation results show that our methods outperform the traditional methods significantly.
引用
收藏
页码:433 / 442
页数:10
相关论文
共 27 条
[1]  
Andres M. E., 2013, P ACM SIGSAC C COMP, P901
[2]  
[Anonymous], 2022, Apple Differential Privacy Technical Overview
[3]  
Blum A., 2005, P 24 ACM SIGMOD SIGA, P128
[4]  
Chang ALS, 2021, Arxiv, DOI arXiv:2104.09734
[5]  
Dhalaria M., 2021, Recent Patents Eng., V15, P225
[6]  
Ding BL, 2017, ADV NEUR IN, V30
[7]  
Dong CL, 2013, INT CONF MACH LEARN, P664, DOI 10.1109/ICMLC.2013.6890373
[8]   Minimax Optimal Procedures for Locally Private Estimation [J].
Duchi, John C. ;
Jordan, Michael I. ;
Wainwright, Martin J. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2018, 113 (521) :182-201
[9]   The Algorithmic Foundations of Differential Privacy [J].
Dwork, Cynthia ;
Roth, Aaron .
FOUNDATIONS AND TRENDS IN THEORETICAL COMPUTER SCIENCE, 2013, 9 (3-4) :211-406
[10]   RAPPOR: Randomized Aggregatable Privacy-Preserving Ordinal Response [J].
Erlingsson, Ulfar ;
Pihur, Vasyl ;
Korolova, Aleksandra .
CCS'14: PROCEEDINGS OF THE 21ST ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2014, :1054-1067