Privacy protection in data mining: A perturbation approach for categorical data

被引:21
|
作者
Li, Xiao-Bai [1 ]
Sarkar, Sumit
机构
[1] Univ Massachusetts, Coll Management, Lowell, MA 01854 USA
[2] Univ Texas, Sch Management, Richardson, TX 75080 USA
关键词
privacy; data confidentiality; data mining; linear programming; Bayesian estimation; data swapping;
D O I
10.1287/isre.1060.0095
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).
引用
收藏
页码:254 / 270
页数:17
相关论文
共 50 条
  • [41] Data Protection and Privacy: Data Protection and Democracy
    Bougiakiotis, Emmanouil
    MODERN LAW REVIEW, 2022, 85 (02): : 566 - 570
  • [42] Data Protection and Privacy: Data Protection and Democracy
    Bougiakiotis, Emmanouil
    MODERN LAW REVIEW, 2021,
  • [43] Privacy-Preserving Multiparty Collaborative Mining with Geometric Data Perturbation
    Chen, Keke
    Liu, Ling
    IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, 2009, 20 (12) : 1764 - 1776
  • [44] Preservation of Privacy in Data Mining by using PCA Based Perturbation Technique
    Gokulnath, C.
    Priyan, M. K.
    Balan, E. Vishnu
    Prabha, K. P. Rama
    Jeyanthi, R.
    2015 INTERNATIONAL CONFERENCE ON SMART TECHNOLOGIES AND MANAGEMENT FOR COMPUTING, COMMUNICATION, CONTROLS, ENERGY AND MATERIALS (ICSTM), 2015, : 202 - 206
  • [45] Random projection-based multiplicative data perturbation for privacy preserving distributed data mining
    Liu, K
    Kargupta, H
    Ryan, J
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (01) : 92 - 106
  • [46] Data Perturbation Method Based on Contrast Mapping for Reversible Privacy-preserving Data Mining
    Kao, Yuan-Hung
    Lee, Wei-Bin
    Hsu, Tien-Yu
    Lin, Chen-Yi
    Tsai, Hui-Fang
    Chen, Tung-Shou
    JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2015, 35 (06) : 789 - 794
  • [47] Data Perturbation Method Based on Contrast Mapping for Reversible Privacy-preserving Data Mining
    Yuan-Hung Kao
    Wei-Bin Lee
    Tien-Yu Hsu
    Chen-Yi Lin
    Hui-Fang Tsai
    Tung-Shou Chen
    Journal of Medical and Biological Engineering, 2015, 35 : 789 - 794
  • [48] SVD-based advanced data perturbation method for privacy-preserving data mining
    Li, Feng
    Li, Sheng-Hong
    Li, Jian-Hua
    Shanghai Jiaotong Daxue Xuebao/Journal of Shanghai Jiaotong University, 2009, 43 (03): : 427 - 431
  • [49] Privacy in Data Mining
    Josep Domingo-Ferrer
    Vicenç Torra
    Data Mining and Knowledge Discovery, 2005, 11 : 117 - 119
  • [50] Privacy in data mining
    Domingo-Ferrer, J
    Torra, V
    DATA MINING AND KNOWLEDGE DISCOVERY, 2005, 11 (02) : 117 - 119