Using Anonymized Data for Classification

被引:45
作者
Inan, Ali [1 ]
Kantarcioglu, Murat [1 ]
Bertino, Elisa [2 ]
机构
[1] Univ Texas Dallas, Dept Comp Sci, Richardson, TX 75083 USA
[2] Purdue Univ, Dept Comp Sci, W Lafayette, IN 47907 USA
来源
ICDE: 2009 IEEE 25TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3 | 2009年
关键词
D O I
10.1109/ICDE.2009.19
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
In recent years, anonymization methods have emerged as an important tool to preserve individual privacy when releasing privacy sensitive data sets. This interest in anonymization techniques has resulted in a plethora of methods for anonymizing data under different privacy and utility assumptions. At the same time, there has been little research addressing how to effectively use the anonymized data for data mining in general and for distributed data mining in particular. In this paper, we propose a new approach for building classifiers using anonymized data by modeling anonymized data as uncertain data. In our method, we do not assume any probability distribution over the data. Instead, we propose collecting all necessary statistics during anonymization and releasing these together with the anonymized data. We show that releasing such statistics does not violate anonymity. Experiments spanning various alternatives both in local and distributed data mining settings reveal that our method performs better than heuristic approaches for handling anonymized data.
引用
收藏
页码:429 / +
页数:2
相关论文
共 28 条
  • [1] On unifying privacy and uncertain data models
    Aggarwal, Charu C.
    [J]. 2008 IEEE 24TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING, VOLS 1-3, 2008, : 386 - 395
  • [2] Bickel P., 2000, Mathematical Statistics: Basic Ideas and Selected Topics, VI.
  • [3] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [4] Chui CK, 2007, LECT NOTES COMPUT SC, V4426, P47
  • [5] Fung BCM, 2005, PROC INT CONF DATA, P205
  • [6] Privacy-preserving distributed mining of association rules on horizontally partitioned data
    Kantarcioglu, M
    Clifton, C
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2004, 16 (09) : 1026 - 1037
  • [7] KARGUPTA H, 2003, ICDM 03, P96
  • [8] Kifer D., 2006, SIGMOD, P217
  • [9] Hierarchical density-based clustering of uncertain data
    Kriegel, HP
    Pfeifle, M
    [J]. Fifth IEEE International Conference on Data Mining, Proceedings, 2005, : 689 - 692
  • [10] LeFevre K., 2006, P ACM SIGKDD INT C K, P277, DOI [DOI 10.1145/1150402.1150435, 10.1145/1150402.1150435.]