Differentially Private K-Means Clustering and a Hybrid Approach to Private Optimization

被引:38
作者
Su, Dong [1 ]
Cao, Jianneng [2 ]
Li, Ninghui [1 ]
Bertino, Elisa [1 ]
Lyu, Min [3 ]
Jin, Hongxia [4 ]
机构
[1] Purdue Univ, Dept Comp Sci, 305 N Univ St, W Lafayette, IN 47907 USA
[2] Inst Infocomm Res, 1 Fusionopolis Way 21-01 Connexis South Tower, Singapore 138632, Singapore
[3] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Anhui, Peoples R China
[4] Samsung Res Amer, 665 Clyde Ave, Mountain View, CA 94043 USA
基金
美国国家科学基金会;
关键词
Differential privacy; k-means clustering; private data publishing;
D O I
10.1145/3133201
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
k-means clustering is a widely used clustering analysis technique in machine learning. In this article, we study the problem of differentially private k-means clustering. Several state-of-the-art methods follow the single-workload approach, which adapts an existing machine-learning algorithm by making each step private. However, most of them do not have satisfactory empirical performance. In this work, we develop techniques to analyze the empirical error behaviors of one of the state-of-the-art single-workload approaches, DPLloyd, which is a differentially private version of the Lloyd algorithm for k-means clustering. Based on the analysis, we propose an improvement of DPLloyd. We also propose a new algorithm for k-means clustering from the perspective of the noninteractive approach, which publishes a synopsis of the input dataset and then runs k-means on synthetic data generated from the synopsis. We denote this approach by EUGkM. After analyzing the empirical error behaviors of EUGkM, we further propose a hybrid approach that combines our DPLloyd improvement and EUGkM. Results from extensive and systematic experiments support our analysis and demonstrate the effectiveness of the DPLloyd improvement, EUGkM, and the hybrid approach.
引用
收藏
页数:33
相关论文
共 50 条
  • [21] Differentially private Riemannian optimization
    Han, Andi
    Mishra, Bamdev
    Jawanpuria, Pratik
    Gao, Junbin
    MACHINE LEARNING, 2024, 113 (03) : 1133 - 1161
  • [22] A Hybrid Approach for Spam Filtering using Local concentration based K-means Clustering
    Jain, Kunal
    Agrawal, Sanjay
    2014 5TH INTERNATIONAL CONFERENCE CONFLUENCE THE NEXT GENERATION INFORMATION TECHNOLOGY SUMMIT (CONFLUENCE), 2014, : 194 - 199
  • [23] Differentially Private Distributed Optimization
    Huang, Zhenqi
    Mitra, Sayan
    Vaidya, Nitin
    PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING AND NETWORKING, 2015,
  • [24] A Hybrid Approach for Detection and Removal of Raindrops Using k-means Clustering and Hough Transformation
    Kullarkar, S. P.
    Jain, S. V.
    HELIX, 2018, 8 (05): : 4056 - 4060
  • [25] On the Optimality of k-means Clustering
    Dalton, Lori A.
    2013 IEEE INTERNATIONAL WORKSHOP ON GENOMIC SIGNAL PROCESSING AND STATISTICS (GENSIPS 2013), 2013, : 70 - 71
  • [26] Clustering of Image Data Using K-Means and Fuzzy K-Means
    Rahmani, Md. Khalid Imam
    Pal, Naina
    Arora, Kamiya
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2014, 5 (07) : 160 - 163
  • [27] k-means clustering of extremes
    Janssen, Anja
    Wan, Phyllis
    ELECTRONIC JOURNAL OF STATISTICS, 2020, 14 (01): : 1211 - 1233
  • [28] Optimization of k-means clustering Segmentation in Head CT images
    Ma, Guoqiang
    Wang, Xiaojuan
    Li, XiaoLan
    INFORMATION TECHNOLOGY APPLICATIONS IN INDUSTRY II, PTS 1-4, 2013, 411-414 : 1247 - +
  • [29] Differentially Private K-anonymity.
    Anjum, Adeel
    Anjum, Adnan
    PROCEEDINGS OF 2014 12TH INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY, 2014, : 153 - 158
  • [30] Hyperspectral Image Classification: A k-means Clustering Based Approach
    Ranjan, Sameer
    Nayak, Deepak Ranjan
    Kumar, Kallepalli Satish
    Dash, Ratnakar
    Majhi, Banshidhar
    2017 4TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2017,