DI-Mondrian: Distributed improved Mondrian for satisfaction of the L-diversity privacy model using Apache Spark

被引:27
作者
Ashkouti, Farough [1 ]
khamforoosh, Keyhan [1 ]
Sheikhahmadi, Amir [1 ]
机构
[1] Islamic Azad Univ, Dept Comp Engn, Sanandaj Branch, Sanandaj, Iran
关键词
Anonymization; PPDP; K-anonymity; L-diversity; Information loss; Apache Spark; RDD; BIG DATA PRIVACY; MAP REDUCE; ANONYMIZATION; ANONYMITY;
D O I
10.1016/j.ins.2020.07.066
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For the extraction of useful patterns, the collected data should be distributed to and shared with analyzers. This, however, creates problems and challenges for the individual with respect to their privacy and identity. In this paper, the Mondrian multidimensional anonymization method was developed and improved for satisfaction of the l-diversity privacy model, and it has been presented in a distributed fashion within the Apache Spark framework. Since one of the major challenges in data privacy is the tradeoff between privacy and data utility, the presented method focuses on information loss and classifier evaluation criteria. Therefore, the cut dimension was selected using the coefficient of variation and information gain criteria, and the cut points were chosen dynamically, which led to a decrease in the information loss parameter and an improvement in the classifier performance evaluation criteria such as accuracy and FMeasure compared to the previous algorithms in the literature. The processing speed is 100 times higher in Spark than in the Hadoop framework. Consequently, the proposed method was presented in a distributed fashion based on RDDs programming within Apache Spark framework. This will resolve the problem of speed in large-scale data anonymization as it exists in the previous Hadoop-based algorithms. The results of the experiments performed on the numerical datasets demonstrate the improvements made by the proposed method. (C) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:1 / 24
页数:24
相关论文
共 38 条
[1]   Privacy-preserving tabular data publishing: A comprehensive evaluation from web to cloud [J].
Abdelhameed, Saad A. ;
Moussa, Sherin M. ;
Khalifa, Mohamed E. .
COMPUTERS & SECURITY, 2018, 72 :74-95
[2]   Improving MapReduce privacy by implementing multi-dimensional sensitivity-based anonymization [J].
Al-Zobbi M. ;
Shahrestani S. ;
Ruan C. .
Journal of Big Data, 2017, 4 (01)
[3]   Security in cloud computing: Opportunities and challenges [J].
Ali, Mazhar ;
Khan, Samee U. ;
Vasilakos, Athanasios V. .
INFORMATION SCIENCES, 2015, 305 :357-383
[4]  
Ayyub BilalM., 2016, Probability, statistics, and reliability for engineers and scientists
[5]  
Canbay Y, 2017, 2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), P833, DOI 10.1109/UBMK.2017.8093543
[6]  
Clifton C, 2013, TRANS DATA PRIV, V6, P161
[7]   Unique in the Crowd: The privacy bounds of human mobility [J].
de Montjoye, Yves-Alexandre ;
Hidalgo, Cesar A. ;
Verleysen, Michel ;
Blondel, Vincent D. .
SCIENTIFIC REPORTS, 2013, 3
[8]   Anonymizing classification data for privacy preservation [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Yu, Philip S. .
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (05) :711-725
[9]   Privacy-Preserving Data Publishing: A Survey of Recent Developments [J].
Fung, Benjamin C. M. ;
Wang, Ke ;
Chen, Rui ;
Yu, Philip S. .
ACM COMPUTING SURVEYS, 2010, 42 (04)
[10]  
Han J, 2012, MOR KAUF D, P1