TBM, a transformation based method for microaggregation of large volume mixed data

被引:0
作者
Mostafa Salari
Saeed Jalili
Reza Mortazavi
机构
[1] Tarbiat Modares University,Computer Engineering Department
[2] Damghan University,School of Engineering
来源
Data Mining and Knowledge Discovery | 2017年 / 31卷
关键词
Microaggregation; Large mixed data; -anonymity ; Privacy Preserving Data Publishing; Multidimensional scaling;
D O I
暂无
中图分类号
学科分类号
摘要
Due to recent advances in data collection and processing, data publishing has emerged by some organizations for scientific and commercial purposes. Published data should be anonymized such that staying useful while the privacy of data respondents is preserved. Microaggregation is a popular mechanism for data anonymization, but naturally operates on numerical datasets. However, the type of data in the real world is usually mixed i.e., there are both numeric and categorical attributes together. In this paper, we propose a novel transformation based method for microaggregation of mixed data called TBM. The method uses multidimensional scaling to generate a numeric equivalent from mixed dataset. The partitioning step of microaggregation is performed on the equivalent dataset but the aggregation step on the original data. TBM can microaggregate large mixed datasets in a short time with low information loss. Experimental results show that the proposed method attains better trade-off between data utility and privacy in a shorter time in comparison with the traditional methods.
引用
收藏
页码:65 / 91
页数:26
相关论文
共 58 条
[1]  
Bai L(2011)An initialization method to simultaneously find initial cluster centers and the number of clusters for clustering categorical data Knowl Based Syst 24 785-795
[2]  
Liang J(1975)Multidimensional binary search trees used for associative searching Commun ACM 18 509-517
[3]  
Dang C(2012)A dissimilarity measure for the k-modes clustering algorithm Knowl Based Syst 26 120-127
[4]  
Bentley JL(2005)Ordinal, continuous and heterogeneous k-anonymity through microaggregation Data Min Knowl Discov 11 195-212
[5]  
Cao F(2006)Efficient multivariate data-oriented microaggregation Int J Very Large Data Bases 15 355-369
[6]  
Liang J(2009)Achieving microaggregation for secure statistical databases using fixed-structure partitioning-based learning automata IEEE Trans Syst Man Cybern B 39 1192-1205
[7]  
Li D(2011)The centroid or consensus of a set of objects with qualitative attributes Expert Syst Appl 38 4908-4919
[8]  
Bai L(2014)Mage: a semantics retaining k-anonymization method for mixed data Knowl Based Syst 55 75-86
[9]  
Dang C(2003)A polynomial algorithm for optimal univariate microaggregation IEEE Trans Knowl Data Eng 15 1043-1044
[10]  
Domingo-Ferrer J(1998)Extensions to the k-means algorithm for clustering large data sets with categorical values Data Min Knowl Discov 2 283-304