Managing dimensionality in data privacy anonymization

被引:16
|
作者
Zakerzadeh, Hessam [1 ]
Aggarwal, Charu C. [2 ]
Barker, Ken [1 ]
机构
[1] Univ Calgary, Calgary, AB, Canada
[2] IBM TJ Watson Res Ctr, Yorktown Hts, NY USA
关键词
High-dimensional anonymization; Privacy; k-Anonymity; l-Diversity; Vertical fragmentation;
D O I
10.1007/s10115-015-0906-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The curse of dimensionality has remained a challenge for a wide variety of algorithms in data mining, clustering, classification, and privacy. Recently, it was shown that an increasing dimensionality makes the data resistant to effective privacy. The theoretical results seem to suggest that the dimensionality curse is a fundamental barrier to privacy preservation. However, in practice, we show that some of the common properties of real data can be leveraged in order to greatly ameliorate the negative effects of the curse of dimensionality. In real data sets, many dimensions contain high levels of inter-attribute correlations. Such correlations enable the use of a process known as vertical fragmentation in order to decompose the data into vertical subsets of smaller dimensionality. An information-theoretic criterion of mutual information is used in the vertical decomposition process. This allows the use of an anonymization process, which is based on combining results from multiple independent fragments. We present a general approach, which can be applied to the k-anonymity, l-diversity, and t-closeness models. In the presence of inter-attribute correlations, such an approach continues to be much more robust in higher dimensionality, without losing accuracy. We present experimental results illustrating the effectiveness of the approach. This approach is resilient enough to prevent identity, attribute, and membership disclosure attack.
引用
收藏
页码:341 / 373
页数:33
相关论文
共 50 条
  • [1] Managing dimensionality in data privacy anonymization
    Hessam Zakerzadeh
    Charu C. Aggarwal
    Ken Barker
    Knowledge and Information Systems, 2016, 49 : 341 - 373
  • [2] Data privacy in the Internet of Things based on anonymization: A review
    Neves, Flavio
    Souza, Rafael
    Sousa, Juliana
    Bonfim, Michel
    Garcia, Vinicius
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (03) : 261 - 291
  • [3] Anonymization of Daily Activity Data by Using l-diversity Privacy Model
    Parameshwarappa, Pooja
    Chen, Zhiyuan
    Koru, Gunes
    ACM TRANSACTIONS ON MANAGEMENT INFORMATION SYSTEMS, 2021, 12 (03)
  • [4] Anonymization Techniques for Privacy Preserving Data Publishing: A Comprehensive Survey
    Majeed, Abdul
    Lee, Sungchang
    IEEE ACCESS, 2021, 9 : 8512 - 8545
  • [5] Hybrid Data Privacy and Anonymization Algorithms for Smart Health Applications
    Fakeeroodeen Y.N.
    Beeharry Y.
    SN Computer Science, 2021, 2 (2)
  • [6] Data anonymization to balance privacy and utility of online social media network data
    Gangarde, Rupali
    Shrivastava, Deepshikha
    Sharma, Amit
    Tandon, Tanishka
    Pawar, Ambika
    Garg, Rachit
    JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY, 2022, 25 (03) : 829 - 838
  • [7] Data Anonymization for Privacy Aware Machine Learning
    Jaidan, David Nizar
    Carrere, Maxime
    Chemli, Zakaria
    Poisvert, Remi
    MACHINE LEARNING, OPTIMIZATION, AND DATA SCIENCE, 2019, 11943 : 725 - 737
  • [8] On the Role of Data Anonymization in Machine Learning Privacy
    Senavirathne, Navoda
    Torra, Vicenc
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 664 - 675
  • [9] Anonymization : Securing privacy in IoT
    Kaur, Jashanpreet
    Sengupta, Jyotsna
    JOURNAL OF INFORMATION & OPTIMIZATION SCIENCES, 2020, 41 (06) : 1463 - 1477
  • [10] Privacy Preserving Big data Using Combine Anonymization and Encryption Approach
    Desai, Vidhi
    Chauhan, Gargi K.
    2019 INNOVATIONS IN POWER AND ADVANCED COMPUTING TECHNOLOGIES (I-PACT), 2019,