Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization

被引:38
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Lyu, Min [3 ]
Jin, Hongxia [4 ]
机构
[1] Purdue Univ, Dept Comp Sci, 305 N Univ St, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, 1 Fusionopolis Way 21-01 Connexis South Tower, Singapore 138632, Singapore
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[4] Samsung Res Amer, 665 Clyde Ave, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; private data publishing;
D O I
10.1145/3133201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for k-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the noninteractive approach, which publishes a synopsis of the input dataset and then runs k-means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.
引用
收藏
页数:33
相关论文
共 50 条
  • [11] Clustering Federated Learning with Differentially Private Optimization on Transformer
    Zhi, Yajing
    PROCEEDINGS OF THE 2024 3RD INTERNATIONAL CONFERENCE ON NETWORKS, COMMUNICATIONS AND INFORMATION TECHNOLOGY, CNCIT 2024, 2024, : 93 - 97
  • [12] Experience with a Hybrid Processor: K-Means Clustering
    Maya Gokhale
    Jan Frigo
    Kevin Mccabe
    James Theiler
    Christophe Wolinski
    Dominique Lavenier
    The Journal of Supercomputing, 2003, 26 : 131 - 148
  • [13] Study on the Location of Private Clinics Based on K-Means Clustering Method and an Integrated Evaluation Model
    Wang, Xiaojia
    Shao, Changyan
    Xu, Sheng
    Zhang, Shanshan
    Xu, Weiqun
    Guan, Yuxiang
    IEEE ACCESS, 2020, 8 : 23069 - 23081
  • [14] A Differentially Private Hybrid Approach to Traffic Monitoring
    Rocha, Rogerio V. M.
    Liborio, Pedro P.
    Patil, Harsh Kupwade
    Aranha, Diego F.
    APPLIED CRYPTOGRAPHY AND NETWORK SECURITY, ACNS 2021, PT II, 2021, 12727 : 233 - 256
  • [15] Experience with a hybrid processor: K-means clustering
    Gokhale, M
    Frigo, J
    Mccabe, K
    Theiler, J
    Wolinski, C
    Lavenier, D
    JOURNAL OF SUPERCOMPUTING, 2003, 26 (02) : 131 - 148
  • [16] Hybrid Iterative K-Means Clustering with Improved Moth-Flame Optimization
    Huang H.
    Li X.
    Wu K.
    Guo L.
    Wang H.
    Ru F.
    Hsi-An Chiao Tung Ta Hsueh/Journal of Xi'an Jiaotong University, 2020, 54 (09): : 32 - 39
  • [17] Differentially private user-based collaborative filtering recommendation based on κ-means clustering
    Chen, Zhili
    Wang, Yu
    Zhang, Shun
    Zhong, Hong
    Chen, Lin
    EXPERT SYSTEMS WITH APPLICATIONS, 2021, 168
  • [18] Early experience with a hybrid processor: K-Means Clustering
    Gokhale, M
    Frigo, J
    McCabe, K
    Theiler, J
    Lavenier, D
    ERSA 2001: PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ENGINEERING OF RECONFIGURABLE SYSTEMS AND ALGORITHMS, 2001, : 29 - 35
  • [19] Differentially private Riemannian optimization
    Andi Han
    Bamdev Mishra
    Pratik Jawanpuria
    Junbin Gao
    Machine Learning, 2024, 113 : 1133 - 1161
  • [20] Differentially Private Fuzzy C-Means Clustering Algorithms for Fuzzy Datasets
    Shakiba, Ali
    2018 6TH IRANIAN JOINT CONGRESS ON FUZZY AND INTELLIGENT SYSTEMS (CFIS), 2018, : 91 - 93