Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization

被引:39
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Lyu, Min [3 ]
Jin, Hongxia [4 ]
机构
[1] Purdue Univ, Dept Comp Sci, 305 N Univ St, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, 1 Fusionopolis Way 21-01 Connexis South Tower, Singapore 138632, Singapore
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[4] Samsung Res Amer, 665 Clyde Ave, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; private data publishing;
D O I
10.1145/3133201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for k-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the noninteractive approach, which publishes a synopsis of the input dataset and then runs k-means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.
引用
收藏
页数:33
相关论文
共 50 条
[41]   DEEP CONVOLUTIONAL K-MEANS CLUSTERING [J].
Goel, Anurag ;
Majumdar, Angshul ;
Chouzenoux, Emilie ;
Chierchia, Giovanni .
2022 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, ICIP, 2022, :211-215
[42]   The Rough Membership k-Means Clustering [J].
Ubukata, Seiki ;
Notsu, Akira ;
Honda, Katsuhiro .
INTEGRATED UNCERTAINTY IN KNOWLEDGE MODELLING AND DECISION MAKING, IUKM 2016, 2016, 9978 :207-216
[43]   Bayesian hierarchical K-means clustering [J].
Liu, Yue ;
Li, Bufang .
INTELLIGENT DATA ANALYSIS, 2020, 24 (05) :977-992
[44]   Discriminative K-Means Laplacian Clustering [J].
Guoqing Chao .
Neural Processing Letters, 2019, 49 :393-405
[45]   Balanced Fair K-Means Clustering [J].
Pan, Renbo ;
Zhong, Caiming ;
Qian, Jiangbo .
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2024, 20 (04) :5914-5923
[46]   Weighting variables in K-means clustering [J].
Huh, Myung-Hoe ;
Lim, Yong B. .
JOURNAL OF APPLIED STATISTICS, 2009, 36 (01) :67-78
[47]   Hybrid Approach of EEG Stress Level Classification Using K-Means Clustering and Support Vector Machine [J].
Wen, Tee Yi ;
Aris, Siti Armiza Mohd .
IEEE ACCESS, 2022, 10 :18370-18379
[48]   Private Sampling: A Noiseless Approach for Generating Differentially Private Synthetic Data [J].
Boedihardjo, March ;
Strohmer, Thomas ;
Vershynin, Roman .
SIAM JOURNAL ON MATHEMATICS OF DATA SCIENCE, 2022, 4 (03) :1082-1115
[49]   A DIFFERENTIALLY PRIVATE DISTRIBUTED OPTIMIZATION METHOD FOR CONSTRAINED OPTIMIZATION [J].
Gu, Chuanye ;
Zhou, Tao ;
Li, Jueyou ;
Wu, Changzhi .
JOURNAL OF INDUSTRIAL AND MANAGEMENT OPTIMIZATION, 2023, 19 (11) :8299-8319
[50]   RK-means clustering: K-means with reliability [J].
Hua, Chunsheng ;
Chen, Qian ;
Wu, Haiyuan ;
Wada, Toshikazu .
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2008, E91D (01) :96-104