Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:34
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [41] Preservation of Data Privacy using PCA based Transformation
    Banu, R. Vidya
    Nagaveni, N.
    [J]. 2009 INTERNATIONAL CONFERENCE ON ADVANCES IN RECENT TECHNOLOGIES IN COMMUNICATION AND COMPUTING (ARTCOM 2009), 2009, : 439 - +
  • [42] K-Anonymization approach for privacy preservation using data perturbation techniques in data mining
    Kiran, Ajmeera
    Shirisha, N.
    [J]. MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 578 - 584
  • [43] Multi-objective optimization-based privacy in data mining
    Bhuyan, Hemanta Kumar
    Ravi, Vinayakumar
    Yadav, M. Srikanth
    [J]. CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2022, 25 (06): : 4275 - 4287
  • [44] Multi-objective optimization-based privacy in data mining
    Hemanta Kumar Bhuyan
    Vinayakumar Ravi
    M. Srikanth Yadav
    [J]. Cluster Computing, 2022, 25 : 4275 - 4287
  • [45] Privacy-Preserving Data Mining on Blockchain-Based WSNs
    Hrovatin, Niki
    Tosic, Aleksandar
    Mrissa, Michael
    Kavsek, Branko
    [J]. APPLIED SCIENCES-BASEL, 2022, 12 (11):
  • [46] A TWO-STEP CHOICE PRIVACY-PRESERVATION METHOD FOR CHECK-IN DATA PUBLISHING
    Zhao, Yang
    Han, Jian-Min
    Lu, Jian-Feng
    Peng, Hao
    Hu, Zhao-Long
    Wang, Li-Xia
    [J]. PROCEEDINGS OF 2018 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOL 1, 2018, : 275 - 281
  • [47] Adaptive Privacy Preservation Approach for Big Data Publishing in Cloud using k-anonymization
    Madan S.
    Goswami P.
    [J]. Recent Advances in Computer Science and Communications, 2021, 14 (08) : 2678 - 2688
  • [48] A tree-based data perturbation approach for privacy-preserving data mining
    Li, Xiao-Bai
    Sarkar, Sumit
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (09) : 1278 - 1283
  • [49] A tree-based data perturbation approach for privacy-preserving data mining
    IEEE Computer Society
    不详
    不详
    [J]. IEEE Trans Knowl Data Eng, 2006, 9 (1278-1283): : 1278 - 1283
  • [50] The applicability of the perturbation based privacy preserving data mining for real-world data
    Liu, Li
    Kantarcioglu, Murat
    Thuraisingham, Bhavani
    [J]. DATA & KNOWLEDGE ENGINEERING, 2008, 65 (01) : 5 - 21