Anonymizing k-NN Classification on MapReduce

被引:5
作者
Bazai, Sibghat Ullah [1 ]
Jang-Jaccard, Julian [1 ]
Wang, Ruili [1 ]
机构
[1] Massey Univ, Inst Nat & Math Sci, Auckland, New Zealand
来源
MOBILE NETWORKS AND MANAGEMENT (MONAMI 2017) | 2018年 / 235卷
关键词
MapReduce; Data anonymization; K-anonymity; k-NN classification; PRIVACY;
D O I
10.1007/978-3-319-90775-8_29
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Data analytics scenario such as a classification algorithm plays an important role in data mining to identify a category of a new observation and is often used to drive new knowledge. However, classification algorithm on a big data analytics platform such as MapReduce and Spark, often runs on plain text without an appropriate privacy protection mechanism. This leaves user's data to be vulnerable from unauthorized access and puts the data at a great privacy risk. To address such concern, we propose a new novel k-NN classifier which can run on an anonymized dataset on MapReduce platform. We describe new Map and Reduce algorithms to produce different anonymized datasets for k-NN classifier. We also illustrate the details of experiments we performed on the multiple anonymized data sets to understand the effects between the level of privacy protection (data privacy) and the high-value insights (data utility) trade-off before and after data anonymization.
引用
收藏
页码:364 / 377
页数:14
相关论文
共 15 条
  • [1] [Anonymous], 2010, NSDI
  • [2] [Anonymous], 2010, Proceedings of the 13th International Conference on Extending Database Technology, EDBT'10, DOI [10.1145/1739041.1739059, DOI 10.1145/1739041.1739059]
  • [3] [Anonymous], 2017, INT C APPL TECHNIQUE
  • [4] [Anonymous], 2010, LARGE SCALE DISTRIB
  • [5] Baryalai Mehmood, 2016, 2016 14th Annual Conference on Privacy, Security and Trust (PST), P392, DOI 10.1109/PST.2016.7906962
  • [6] NEAREST NEIGHBOR PATTERN CLASSIFICATION
    COVER, TM
    HART, PE
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1967, 13 (01) : 21 - +
  • [7] Frank A., 2010, UCI Machine Learning Repository.
  • [8] Using Anonymized Data for Classification
    Inan, Ali
    Kantarcioglu, Murat
    Bertino, Elisa
    [J]. ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2009, : 429 - +
  • [9] A MapReduce-based k-Nearest Neighbor Approach for Big Data Classification
    Maillo, Jesus
    Triguero, Isaac
    Herrera, Francisco
    [J]. 2015 IEEE TRUSTCOM/BIGDATASE/ISPA, VOL 2, 2015, : 167 - 172
  • [10] k-anonymity:: A model for protecting privacy
    Sweeney, L
    [J]. INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) : 557 - 570