Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering

被引:2
作者
O'Shaughnessy, Pauline [1 ]
Lin, Yan-Xia [1 ]
机构
[1] Univ Wollongong, Sch Math & Appl Stat, Wollongong, NSW 2522, Australia
关键词
data masking; multiplicative noise; data mining; sample size calculation;
D O I
10.3390/math10244744
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In this paper, we address this problem and propose a framework for performing data analysis using data from multiple sources without revealing true values for privacy purposes. The proposed framework includes three steps. First, data custodians individually mask data before publishing; then, the masked data collection is used to reconstruct the density function of the original dataset, from which resampled values are generated; last, existing data mining techniques are applied directly to the resampled data. This framework utilises the technique of reconstructing an original density function from noise-masked data using the moment-based density estimation method, which plays an essential role. Simulation studies show that the proposed framework performs well; analysis results from the resampled data are comparable to those of the original data when the density of the original data is estimated well. The proposed framework is demonstrated in data clustering analysis using the example of a real-life Australian soybean dataset. Results from the k-means algorithms with two and three fitted clusters are presented to show that cluster analysis using resampled data can well replicate that of the original data.
引用
收藏
页数:13
相关论文
共 50 条
  • [11] Privacy preserving data mining
    Lindell, Y
    Pinkas, B
    JOURNAL OF CRYPTOLOGY, 2002, 15 (03) : 177 - 206
  • [12] Information Security in Big Data: Privacy and Data Mining
    Xu, Lei
    Jiang, Chunxiao
    Wang, Jian
    Yuan, Jian
    Ren, Yong
    IEEE ACCESS, 2014, 2 : 1149 - 1176
  • [13] Data mining and privacy: An overview
    Clifton, CW
    Mulligan, DK
    Ramakrishnan, R
    PRIVACY AND TECHNOLOGIES OF IDENTITY: A CROSS-DISCIPLINARY CONVERSATION, 2006, : 191 - 208
  • [14] Privacy in Data Mining: A Review
    Dutta, Sharmistha
    Gupta, Ankit Kumar
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 556 - 559
  • [15] Privacy preserving data mining algorithms by data distortion
    Wu Xiao-dan
    Yue Dian-min
    Liu Feng-li
    Wang Yun-feng
    Chu Chao-Hsien
    PROCEEDINGS OF THE 2006 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE & ENGINEERING (13TH), VOLS 1-3, 2006, : 223 - 228
  • [16] Privacy Protection Method for K-modes Clustering Data with Local Differential Privacy
    Zhang S.-B.
    Yuan L.-J.
    Mao X.-J.
    Zhu G.-M.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2022, 50 (09): : 2181 - 2188
  • [17] Quantifying privacy for privacy preserving data mining
    Zhan, Justin
    2007 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, VOLS 1 AND 2, 2007, : 630 - 636
  • [18] Data Mining with Privacy Protection Using Precise Elliptical Curve Cryptography
    Murugeshwari, B.
    Selvaraj, D.
    Sudharson, K.
    Radhika, S.
    INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2023, 35 (01) : 839 - 851
  • [19] A description clustering data mining technique for heterogeneous data
    Lopez, Alejandro Garcia
    Berlanga, Rafael
    Danger, Roxana
    SOFTWARE AND DATA TECHNOLOGIES, 2008, 10 : 361 - +
  • [20] DATA MINING AS A TOOL IN PRIVACY-PRESERVING DATA PUBLISHING
    Sramka, Michal
    NILCRYPT 10, 2010, 45 : 151 - 159