Generalization-based privacy preservation and discrimination prevention in data publishing and mining

被引:34
作者
Hajian, Sara [1 ]
Domingo-Ferrer, Josep [1 ]
Farras, Oriol [1 ]
机构
[1] Univ Rovira & Virgili, Dept Comp Engn & Maths, UNESCO Chair Data Privacy, E-43007 Tarragona, Spain
关键词
Data mining; Anti-discrimination; Privacy; Generalization; K-ANONYMITY;
D O I
10.1007/s10618-014-0346-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Living in the information society facilitates the automatic collection of huge amounts of data on individuals, organizations, etc. Publishing such data for secondary analysis (e.g. learning models and finding patterns) may be extremely useful to policy makers, planners, marketing analysts, researchers and others. Yet, data publishing and mining do not come without dangers, namely privacy invasion and also potential discrimination of the individuals whose data are published. Discrimination may ensue from training data mining models (e.g. classifiers) on data which are biased against certain protected groups (ethnicity, gender, political preferences, etc.). The objective of this paper is to describe how to obtain data sets for publication that are: (i) privacy-preserving; (ii) unbiased regarding discrimination; and (iii) as useful as possible for learning models and finding patterns. We present the first generalization-based approach to simultaneously offer privacy preservation and discrimination prevention. We formally define the problem, give an optimal algorithm to tackle it and evaluate the algorithm in terms of both general and specific data analysis metrics (i.e. various types of classifiers and rule induction algorithms). It turns out that the impact of our transformation on the quality of data is the same or only slightly higher than the impact of achieving just privacy preservation. In addition, we show how to extend our approach to different privacy models and anti-discrimination legal concepts.
引用
收藏
页码:1158 / 1188
页数:31
相关论文
共 50 条
  • [31] Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned
    Mewada, Shivlal
    INTERNATIONAL JOURNAL OF E-HEALTH AND MEDICAL COMMUNICATIONS, 2021, 12 (05) : 50 - 66
  • [32] PMDG: Privacy for Multi-perspective Process Mining Through Data Generalization
    Hildebrant, Ryan
    Fahrenkrog-Petersen, Stephan A.
    Weidlich, Matthias
    Ren, Shangping
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2023, 2023, 13901 : 506 - 521
  • [33] PUBLISHING SENSITIVE TIME-SERIES DATA UNDER PRESERVATION OF PRIVACY AND DISTANCE ORDERS
    Choi, Mi-Jung
    Kim, Hea-Suk
    Moon, Yang-Sae
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2012, 8 (5B): : 3619 - 3638
  • [34] Hybrid optimization-based privacy preservation of database publishing in cloud environment
    Doss, Kingsleen Solomon
    Kamalakkannan, Somasundaram
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2022, 34 (11)
  • [35] A Targeted Privacy-Preserving Data Publishing Method Based on Bayesian Network
    Zhou, Zhigang
    Wang, Yu
    Yu, Xiao
    Miao, Junzhong
    IEEE ACCESS, 2022, 10 : 89555 - 89567
  • [36] Privacy Preservation in Utility Mining Based on Genetic Algorithm: A New Approach
    Rathi, Sugandha
    Soni, Rishi
    PROCEEDINGS OF FIFTH INTERNATIONAL CONFERENCE ON SOFT COMPUTING FOR PROBLEM SOLVING (SOCPROS 2015), VOL 2, 2016, 437 : 71 - 80
  • [37] Privacy Based Data Publishing Model for Cloud Computing Environment
    J. V. Bibal Benifa
    G. Venifa Mini
    Wireless Personal Communications, 2020, 113 : 2215 - 2241
  • [38] Privacy Based Data Publishing Model for Cloud Computing Environment
    Bibal Benifa, J. V.
    Venifa Mini, G.
    WIRELESS PERSONAL COMMUNICATIONS, 2020, 113 (04) : 2215 - 2241
  • [39] Multiple Sensitive Attributes Based Privacy Preserving Data Publishing
    Vanasiwala, Jasmina N.
    Nanavati, Nirali R.
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON COMPUTING METHODOLOGIES AND COMMUNICATION (ICCMC 2018), 2018, : 394 - 400
  • [40] K-Anonymization approach for privacy preservation using data perturbation techniques in data mining
    Kiran, Ajmeera
    Shirisha, N.
    MATERIALS TODAY-PROCEEDINGS, 2022, 64 : 578 - 584