Privacy protection in data mining: A perturbation approach for categorical data

被引：21

作者：

Li, Xiao-Bai ^{[1
]}

Sarkar, Sumit

机构：

[1] Univ Massachusetts, Coll Management, Lowell, MA 01854 USA

[2] Univ Texas, Sch Management, Richardson, TX 75080 USA

来源：

INFORMATION SYSTEMS RESEARCH | 2006年 / 17卷 / 03期

关键词：

privacy; data confidentiality; data mining; linear programming; Bayesian estimation; data swapping;

D O I：

10.1287/isre.1060.0095

中图分类号：

G25 [图书馆学、图书馆事业]; G35 [情报学、情报工作];

学科分类号：

1205 ; 120501 ;

摘要：

To respond to growing concerns about privacy of personal information, organizations that use their customers' records in data-mining activities are forced to take actions to protect the privacy of the individuals involved. A common practice for many organizations today is to remove identity-related attributes from the customer records before releasing them to data miners or analysts. We investigate the effect of this practice and demonstrate that many records in a data set could be uniquely identified even after identity-related attributes are removed. We propose a perturbation method for categorical data that can be used by organizations to prevent or limit disclosure of confidential data for identifiable records when the data are provided to analysts for classification, a common data-mining task. The proposed method attempts to preserve the statistical properties of the data based on privacy protection parameters specified by the organization. We show that the problem can be solved in two phases, with a linear programming formulation in Phase I (to preserve the first-order marginal distribution), followed by a simple Bayes-based swapping procedure in Phase 11 (to preserve the joint distribution).

引用

页码：254 / 270

页数：17

共 50 条

[1] OCDP: An enhanced perturbation approach for data privacy protection
Devi, S. Sathiya
Jayasri, K.
JOURNAL OF INFORMATION SECURITY AND APPLICATIONS, 2025, 90
[2] Privacy Protection in Data Mining
Fu, Chunchang
Zhang, Nan
2010 INTERNATIONAL CONFERENCE ON MANAGEMENT SCIENCE AND ENGINEERING (MSE 2010), VOL 2, 2010, : 92 - 93
[3] Protection or privacy? Data mining and personal data
Hand, DJ
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PROCEEDINGS, 2006, 3918 : 1 - 10
[4] A tree-based data perturbation approach for privacy-preserving data mining
Li, Xiao-Bai
Sarkar, Sumit
IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (09) : 1278 - 1283
[5] A tree-based data perturbation approach for privacy-preserving data mining
IEEE Computer Society
不详
不详
IEEE Trans Knowl Data Eng, 2006, 9 (1278-1283):
[6] Hybrid Approach for Privacy Enhancement in Data Mining Using Arbitrariness and Perturbation
Murugeshwari, B.
Rajalakshmi, S.
Sudharson, K.
COMPUTER SYSTEMS SCIENCE AND ENGINEERING, 2023, 44 (03): : 2293 - 2307
[7] Geometric data perturbation for privacy preserving outsourced data mining
Chen, Keke
Liu, Ling
KNOWLEDGE AND INFORMATION SYSTEMS, 2011, 29 (03) : 657 - 695
[8] Geometric data perturbation for privacy preserving outsourced data mining
Keke Chen
Ling Liu
Knowledge and Information Systems, 2011, 29 : 657 - 695
[9] K-Anonymization approach for privacy preservation using data perturbation techniques in data mining
Kiran, Ajmeera
Shirisha, N.
MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 578 - 584
[10] K-Anonymization approach for privacy preservation using data perturbation techniques in data mining
Kiran, Ajmeera
Shirisha, N.
MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 578 - 584

← 1 2 3 4 5 →