Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization

被引:39
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Lyu, Min [3 ]
Jin, Hongxia [4 ]
机构
[1] Purdue Univ, Dept Comp Sci, 305 N Univ St, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, 1 Fusionopolis Way 21-01 Connexis South Tower, Singapore 138632, Singapore
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[4] Samsung Res Amer, 665 Clyde Ave, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; private data publishing;
D O I
10.1145/3133201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for k-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the noninteractive approach, which publishes a synopsis of the input dataset and then runs k-means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.
引用
收藏
页数:33
相关论文
共 50 条
  • [31] Hyperspectral Image Classification: A k-means Clustering Based Approach
    Ranjan, Sameer
    Nayak, Deepak Ranjan
    Kumar, Kallepalli Satish
    Dash, Ratnakar
    Majhi, Banshidhar
    [J]. 2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,
  • [32] DIFFERENTIALLY PRIVATE ACCELERATED OPTIMIZATION ALGORITHMS
    Kuru, Nurdan
    Birbil, S. Ilker
    Gurbuzbalaban, Mert
    Yildirim, Sinan
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2022, 32 (02) : 795 - 821
  • [33] Coarse Graining of Complex Networks: A k-means Clustering Approach
    Xu, Shuang
    Wang, Pei
    [J]. PROCEEDINGS OF THE 28TH CHINESE CONTROL AND DECISION CONFERENCE (2016 CCDC), 2016, : 4113 - 4118
  • [34] Image segmentation based on ant colony optimization and K-means clustering
    Zhao, Bo
    Zhu, Zhongxiang
    Mao, Enrong
    Song, Zhenghe
    [J]. 2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 459 - 463
  • [35] Recommendation Model Based on K-means Clustering Optimization Neural Network
    Lin Jinjian
    [J]. 2018 4TH INTERNATIONAL CONFERENCE ON EDUCATION, MANAGEMENT AND INFORMATION TECHNOLOGY (ICEMIT 2018), 2018, : 1362 - 1366
  • [36] Dynamic Incremental K-means Clustering
    Aaron, Bryant
    Tamir, Dan E.
    Rishe, Naphtali D.
    Kandel, Abraham
    [J]. 2014 INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE AND COMPUTATIONAL INTELLIGENCE (CSCI), VOL 1, 2014, : 308 - 313
  • [37] k-means clustering for persistent homology
    Cao, Yueqi
    Leung, Prudence
    Monod, Anthea
    [J]. ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2025, 19 (01) : 95 - 119
  • [38] K-Means Clustering With Incomplete Data
    Wang, Siwei
    Li, Miaomiao
    Hu, Ning
    Zhu, En
    Hu, Jingtao
    Liu, Xinwang
    Yin, Jianping
    [J]. IEEE ACCESS, 2019, 7 : 69162 - 69171
  • [39] K-means - Laplacian clustering revisited
    Rengasamy, Sundar
    Murugesan, Punniyamoorthy
    [J]. ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 107
  • [40] Discriminative K-Means Laplacian Clustering
    Chao, Guoqing
    [J]. NEURAL PROCESSING LETTERS, 2019, 49 (01) : 393 - 405