Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization

被引:38
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Lyu, Min [3 ]
Jin, Hongxia [4 ]
机构
[1] Purdue Univ, Dept Comp Sci, 305 N Univ St, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, 1 Fusionopolis Way 21-01 Connexis South Tower, Singapore 138632, Singapore
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[4] Samsung Res Amer, 665 Clyde Ave, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; private data publishing;
D O I
10.1145/3133201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for k-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the noninteractive approach, which publishes a synopsis of the input dataset and then runs k-means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.
引用
收藏
页数:33
相关论文
共 50 条
  • [1] Differentially Private K-Means Clustering
    Su, Dong
    Cao, Jianneng
    Li, Ninghui
    Bertino, Elisa
    Jin, Hongxia
    CODASPY'16: PROCEEDINGS OF THE SIXTH ACM CONFERENCE ON DATA AND APPLICATION SECURITY AND PRIVACY, 2016, : 26 - 37
  • [2] Optimal Differentially Private Algorithms for k-Means Clustering
    Huang, Zhiyi
    Liu, Jinyan
    PODS'18: PROCEEDINGS OF THE 37TH ACM SIGMOD-SIGACT-SIGAI SYMPOSIUM ON PRINCIPLES OF DATABASE SYSTEMS, 2018, : 395 - 408
  • [3] Differentially Private k-Means Clustering With Convergence Guarantee
    Lu, Zhigang
    Shen, Hong
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2021, 18 (04) : 1541 - 1552
  • [4] A Convergent Differentially Private k-Means Clustering Algorithm
    Lu, Zhigang
    Shen, Hong
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2019, PT I, 2019, 11439 : 612 - 624
  • [5] DIFFERENTIALLY PRIVATE COMPRESSIVE K-MEANS
    Schellekens, V.
    Chatalic, A.
    Houssiau, F.
    de Montjoye, Y. -A.
    Jacques, L.
    Gribonval, R.
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 7933 - 7937
  • [6] Locally Private k-Means Clustering
    Stemmer, Uri
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [7] Differentially Private K-Means Clustering Applied to Meter Data Analysis and Synthesis
    Ravi, Nikhil
    Scaglione, Anna
    Kadam, Sachin
    Gentz, Reinhard
    Peisert, Sean
    Lunghino, Brent
    Levijarvi, Emmanuel
    Shumavon, Aram
    IEEE TRANSACTIONS ON SMART GRID, 2022, 13 (06) : 4801 - 4814
  • [8] Coresets for Differentially Private K-Means Clustering and Applications to Privacy in Mobile Sensor Networks
    Feldman, Dan
    Xiang, Chongyuan
    Zhu, Ruihao
    Rus, Daniela
    2017 16TH ACM/IEEE INTERNATIONAL CONFERENCE ON INFORMATION PROCESSING IN SENSOR NETWORKS (IPSN), 2017, : 3 - 15
  • [9] Utility-efficient differentially private K-means clustering based on cluster merging
    Ni, Tianjiao
    Qiao, Minghao
    Chen, Zhili
    Zhang, Shun
    Zhong, Hong
    NEUROCOMPUTING, 2021, 424 : 205 - 214
  • [10] Practical multi-party private collaborative k-means clustering
    Zhang, En
    Li, Huimin
    Huang, Yuchen
    Hong, Shuangxi
    Zhao, Le
    Ji, Congmin
    NEUROCOMPUTING, 2022, 467 : 256 - 265