Statistical Privacy and Consent in Data Aggregation

被引:0
作者
Scope, Nick [1 ]
Rasin, Alexander [1 ]
Ben Lenard [2 ]
Wagner, James [3 ]
机构
[1] DePaul Univ, Chicago, IL 60604 USA
[2] DePaul Univ, Argonne Natl Lab, Chicago, IL USA
[3] Univ New Orleans, New Orleans, LA 70148 USA
来源
SCIENTIFIC AND STATISTICAL DATABASE MANAGEMENT 36TH INTERNATIONAL CONFERENCE, SSDBM 2024 | 2024年
基金
美国国家科学基金会;
关键词
GDPR; Compliance; Processing consent; Privacy;
D O I
10.1145/3676288.3676298
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
As new laws governing management of personal data are introduced, e.g., the European Union's General Data Protection Regulation of 2016 and the California Consumer Privacy Act of 2018, compliance with data governance legislation is becoming an increasingly important aspect of data management. An important component of many data privacy laws is that they require companies to only use an individual's data for a purpose the individual has explicitly consented to. Prior methods for enforcing consent for aggregate queries either use access control to eliminate data without consent from query evaluation or apply differential privacy algorithms to inject synthetic noise into the outcomes of queries (or input data) to ensure that the anonymity of non-consenting individuals is preserved with high probability. Both approaches return query results that differ from the ground truth results corresponding to the full input containing data from both consenting and non-consenting individuals. We present an alternative framework for group-by aggregate queries, tailored for applications, e.g., medicine, where even a small deviation from the correct answer to a query cannot be tolerated. Our approach uses provenance to determine, for each output tuple of a group-by aggregate query, which individual's data was used to derive the result for this group. We then use statistical tests to determine how likely it is that the presence of data for a non-consenting individual will be revealed by such an output tuple. We filter out tuples for which this test fails, i.e., which are deemed likely to reveal non-consenting data. Thus, our approach always returns a subset of the ground truth query answers. Our experiments successfully return only 100% accurate results in instances where access control or differential privacy would have either returned less total or less accurate results.
引用
收藏
页数:12
相关论文
共 28 条
  • [1] [Anonymous], 2018, IEEE Data Eng. Bull.
  • [2] [Anonymous], 2021, 27,8 million GDPR fine for Italian Telecom -TIM
  • [3] [Anonymous], 2020, GDPR Archives
  • [4] [Anonymous], 2022, Qcow
  • [5] [Anonymous], 2023, About us
  • [6] [Anonymous], 2023, PL/Python Functions
  • [7] [Anonymous], 2020, California Consumer Privacy Act
  • [8] [Anonymous], 2023, Create function (external scalar) statement
  • [9] Ataullah Ahmed A, 2008, P 17 ACM C INF KNOWL, P873, DOI [10.1145/1458082.1458197, DOI 10.1145/1458082.1458197]
  • [10] Consent-driven data use in crowdsensing platforms: When data reuse meets privacy-preservation
    Brahem, Mariem
    Scerri, Guillaume
    Anciaux, Nicolas
    Issarny, Valerie
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON PERVASIVE COMPUTING AND COMMUNICATIONS (PERCOM), 2021,