Privacy protection in data mining: A perturbation approach for categorical data

被引:21
|
作者
Li, Xiao-Bai [1 ]
Sarkar, Sumit
机构
[1] Univ Massachusetts, Coll Management, Lowell, MA 01854 USA
[2] Univ Texas, Sch Management, Richardson, TX 75080 USA
关键词
privacy; data confidentiality; data mining; linear programming; Bayesian estimation; data swapping;
D O I
10.1287/isre.1060.0095
中图分类号
G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];
学科分类号
1205 ; 120501 ;
摘要
To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).
引用
收藏
页码:254 / 270
页数:17
相关论文
共 50 条
  • [31] Random projection data perturbation based privacy protection in WSNs
    Ming, Zhao
    Zheng-Jiang, Wu
    Liu, Hui
    2017 IEEE CONFERENCE ON COMPUTER COMMUNICATIONS WORKSHOPS (INFOCOM WKSHPS), 2017, : 493 - 498
  • [32] The applicability of the perturbation based privacy preserving data mining for real-world data
    Liu, Li
    Kantarcioglu, Murat
    Thuraisingham, Bhavani
    DATA & KNOWLEDGE ENGINEERING, 2008, 65 (01) : 5 - 21
  • [33] An Efficient Approach for Privacy Preserving in Data Mining
    Sharma, Manish
    Chaudhary, Atul
    Mathuria, Manish
    Chaudhary, Shalini
    Kumar, Santosh
    2014 INTERNATIONAL CONFERENCE ON SIGNAL PROPAGATION AND COMPUTER TECHNOLOGY (ICSPCT 2014), 2014, : 244 - 249
  • [34] Condensation approach to privacy preserving data mining
    Aggarwal, CC
    Yu, PS
    ADVANCES IN DATABASE TECHNOLOGY - EDBT 2004, PROCEEDINGS, 2004, 2992 : 183 - 199
  • [35] A Cryptographic Approach for Achieving Privacy in Data Mining
    Abitha, N.
    Sarada, G.
    Manikandan, G.
    Sairam, N.
    2015 INTERNATIONAL CONFERENCED ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT-2015), 2015,
  • [36] Data privacy in construction industry by privacy-preserving data mining (PPDM) approach
    Patel T.
    Patel V.
    Asian Journal of Civil Engineering, 2020, 21 (3) : 505 - 515
  • [37] Against Classification Attacks: A Decision Tree Pruning Approach to Privacy Protection in Data Mining
    Li, Xiao-Bai
    Sarkar, Sumit
    OPERATIONS RESEARCH, 2009, 57 (06) : 1496 - 1509
  • [38] Blockchain Data Privacy Protection Mechanism for Enterprise Finance and Data Mining Algorithms
    Ma, Xuejun
    Zhang, Yongshan
    Engineering Intelligent Systems, 32 (05): : 435 - 443
  • [39] An efficient perturbation approach for multivariate data in sensitive and reliable data mining
    Paul, Mahit Kumar
    Islam, Md Rabiul
    Sattar, A. H. M. Sarowar
    JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2021, 62
  • [40] A Data Mining Approach to Assess Privacy Risk in Human Mobility Data
    Pellungrini, Roberto
    Pappalardo, Luca
    Pratesi, Francesca
    Monreale, Anna
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2018, 9 (03)