Privacy Protection Practice for Data Mining with Multiple Data Sources: An Example with Data Clustering

被引:2
作者
O'Shaughnessy, Pauline [1 ]
Lin, Yan-Xia [1 ]
机构
[1] Univ Wollongong, Sch Math & Appl Stat, Wollongong, NSW 2522, Australia
关键词
data masking; multiplicative noise; data mining; sample size calculation;
D O I
10.3390/math10244744
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
In the age of data, data mining provides feasible tools with which to handle large datasets consisting of data from multiple sources. However, there is limited research on retrieving statistical information from data when data are confidential and cannot be shared directly. In this paper, we address this problem and propose a framework for performing data analysis using data from multiple sources without revealing true values for privacy purposes. The proposed framework includes three steps. First, data custodians individually mask data before publishing; then, the masked data collection is used to reconstruct the density function of the original dataset, from which resampled values are generated; last, existing data mining techniques are applied directly to the resampled data. This framework utilises the technique of reconstructing an original density function from noise-masked data using the moment-based density estimation method, which plays an essential role. Simulation studies show that the proposed framework performs well; analysis results from the resampled data are comparable to those of the original data when the density of the original data is estimated well. The proposed framework is demonstrated in data clustering analysis using the example of a real-life Australian soybean dataset. Results from the k-means algorithms with two and three fitted clusters are presented to show that cluster analysis using resampled data can well replicate that of the original data.
引用
收藏
页数:13
相关论文
共 50 条
  • [21] PRIVACY PRESERVATION FOR DISTANCE BASED DATA MINING IN DISTRIBUTED DATA
    Mtengwa, Rudo R.
    Mawuli, Cobbinah Bernard
    Kulevome, Delanyo
    Hailemichael, Mamo Tadiyos
    Agbley, Fortune
    2022 19TH INTERNATIONAL COMPUTER CONFERENCE ON WAVELET ACTIVE MEDIA TECHNOLOGY AND INFORMATION PROCESSING (ICCWAMTIP), 2022,
  • [22] Privacy-Preserving Data Mining and the Need for Confluence of Research and Practice
    Fu, Lixin
    Nemati, Hamid
    Sadri, Fereidoon
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY AND PRIVACY, 2007, 1 (01) : 47 - 64
  • [23] Privacy Preserving Data Mining in Terms of DBSCAN Clustering Algorithm in Distributed Systems
    Anikin, Igor V.
    Gazimov, Rinat M.
    2018 INTERNATIONAL CONFERENCE ON INDUSTRIAL ENGINEERING, APPLICATIONS AND MANUFACTURING (ICIEAM), 2018,
  • [24] A Survey on Privacy Preserving Data Mining
    Wang, Jian
    Luo, Yongcheng
    Zhao, Yan
    Le, Jianjin
    FIRST INTERNATIONAL WORKSHOP ON DATABASE TECHNOLOGY AND APPLICATIONS, PROCEEDINGS, 2009, : 111 - 114
  • [25] Informational privacy, data mining, and the Internet
    Tavani H.T.
    Ethics and Information Technology, 1999, 1 (2) : 137 - 145
  • [26] An Overview of Privacy Preserving Data Mining
    Qi, Xinjun
    Zong, Mingkui
    2011 INTERNATIONAL CONFERENCE OF ENVIRONMENTAL SCIENCE AND ENGINEERING, VOL 12, PT B, 2012, 12 : 1341 - 1347
  • [27] Data mining in mining engineering: results of classification and clustering of shovels failures data
    Dindarloo, Saeid R.
    Siami-Irdemoosa, Elnaz
    INTERNATIONAL JOURNAL OF MINING RECLAMATION AND ENVIRONMENT, 2017, 31 (02) : 105 - 118
  • [28] A Privacy-Preserving Data Obfuscation Scheme Used in Data Statistics and Data Mining
    Yang, Pan
    Gui, Xiaolin
    Tian, Feng
    Yao, Jing
    Lin, Jiancai
    2013 IEEE 15TH INTERNATIONAL CONFERENCE ON HIGH PERFORMANCE COMPUTING AND COMMUNICATIONS & 2013 IEEE INTERNATIONAL CONFERENCE ON EMBEDDED AND UBIQUITOUS COMPUTING (HPCC_EUC), 2013, : 881 - 887
  • [29] Distributed threshold k-means clustering for privacy preserving data mining
    Baby, Vadlana
    Chandra, N. Subhash
    2016 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATIONS AND INFORMATICS (ICACCI), 2016, : 2286 - 2289
  • [30] A new data clustering approach for data mining in large databases
    Tsai, CF
    Wu, HC
    Tsai, CW
    I-SPAN'02: INTERNATIONAL SYMPOSIUM ON PARALLEL ARCHITECTURES, ALGORITHMS AND NETWORKS, PROCEEDINGS, 2002, : 315 - 320