The Compromise of Data Privacy in Predictive Performance

被引:5
作者
Carvalho, Tania [1 ]
Moniz, Nuno [1 ,2 ]
机构
[1] Univ Porto, Fac Sci, Comp Sci Dept, Porto, Portugal
[2] INESC TEC, Porto, Portugal
来源
ADVANCES IN INTELLIGENT DATA ANALYSIS XIX, IDA 2021 | 2021年 / 12695卷
关键词
Data privacy; Supervised learning; Re-identification risk; Record linkage;
D O I
10.1007/978-3-030-74251-5_34
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Privacy-preservation has become an essential concern in many data mining applications since the emergence of legal obligations to protect personal data. Thus, the notion of Privacy-Preserving Data Mining emerged to allow the extraction of knowledge from data without violating the privacy of individuals. Several transformation techniques have been proposed to protect the privacy of individuals. However, their application does not guarantee a null risk of an individual being reidentified. Furthermore, and most importantly, for this paper, the application of such techniques may have a considerable impact on the utility of data and their use in predictive and descriptive tasks. In this paper, we present a study to provide key insights concerning the impact of privacy-preserving techniques in predictive performance. Unlike previous work, our main conclusions point towards a noticeable impact of privacy-preservation techniques in predictive performance.
引用
收藏
页码:426 / 438
页数:13
相关论文
共 27 条
  • [1] [Anonymous], 2001, MACH LEARN
  • [2] [Anonymous], 1998, PROTECTING PRIVACY D
  • [3] Brodersen Kay H., 2010, Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR 2010), P3121, DOI 10.1109/ICPR.2010.764
  • [4] Buratovic I., 2012, 2012 35th International Convention on Information and Communication Technology, Electronics and Microelectronics, P1619
  • [5] Chen T., 2016, KDD16 P 22 ACM, P785, DOI DOI 10.1145/2939672.2939785
  • [6] SUPPORT-VECTOR NETWORKS
    CORTES, C
    VAPNIK, V
    [J]. MACHINE LEARNING, 1995, 20 (03) : 273 - 297
  • [7] De Bruin J, 2019, Zenodo
  • [8] Domingo-Ferrer J., 2002, Distance-based and probabilistic record linkage for re-identification of records with categorical variables, P243
  • [9] Domingo-Ferrer J. D., 2016, Synthesis Lectures on Information Security, Privacy, & Trust, V8, P1, DOI 10.2200/s00690ed1v01y201512spt015
  • [10] Domingo-Ferrer J, 2008, ADV DATABASE SYST, V34, P53