Scoring System for Quantifying the Privacy in Re-Identification of Tabular Datasets

被引：0

作者：

Folz, Jakob ^{[1
,2
]}

Vidanalage, Manjitha D. ^{[2
]}

Aufschlaeger, Robert ^{[2
]}

Almaini, Amar ^{[2
]}

Heigl, Michael ^{[2
]}

Fiala, Dalibor ^{[1
]}

Schramm, Martin ^{[2
]}

机构：

[1] Univ West Bohemia, Fac Appl Sci, Dept Comp Sci & Engn, Plzen 30100, Czech Republic

[2] Deggendorf Inst Technol, Inst ProtectIT, Fac Comp Sci, D-94469 Deggendorf, Germany

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Data privacy; Measurement; Risk analysis; Open data; Protection; Security; Scalability; Public transportation; Object recognition; Mathematical models; Anonymization; privacy; re-identification risk; GDPR; uniqueness; uniformity; correlation; open data; ANONYMIZATION; TRANSPARENCY; CHALLENGES;

D O I：

10.1109/ACCESS.2025.3563309

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This study introduces a System for Calculating Open Data Re-identification Risk (SCORR), a framework for quantifying privacy risks in tabular datasets. SCORR extends conventional metrics such as k-anonymity, l-diversity, and t-closeness with novel extended metrics, including uniqueness-only risk, uniformity-only risk, correlation-only risk, and Markov Model risk, to identify a broader range of re-identification threats. It efficiently analyses event-level and person-level datasets with categorical and numerical attributes. Experimental evaluations were conducted on three publicly available datasets: OULAD, HID, and Adult, across multiple anonymisation levels. The results indicate that higher anonymisation levels do not always proportionally enhance privacy. While stronger generalisation improves k-anonymity, l-diversity and t-closeness vary significantly across datasets. Uniqueness-only and uniformity-only risk decreased with anonymisation, whereas correlation-only risk remained high. Meanwhile, Markov Model risk consistently remained high, indicating little to no improvement regardless of the anonymisation level. Scalability analysis revealed that conventional metrics and Uniqueness-only risk incurred minimal computational overhead, remaining independent of dataset size. However, correlation-only and uniformity-only risk required significantly more processing time, while Markov Model risk incurred the highest computational cost. Despite this, all metrics remained unaffected by the number of quasi-identifiers, except t-closeness, which scaled linearly beyond a certain threshold. A usability evaluation comparing SCORR with the freely available ARX Tool showed that SCORR reduced the number of user interactions required for risk analysis by 59.38%, offering a more streamlined and efficient process. These results confirm SCORR's effectiveness in helping data custodians balance privacy protection and data utility, advancing privacy risk assessment beyond existing tools.

引用

页码：75727 / 75743

页数：17

共 60 条

[1]

Anonymisation, 2012, Managing Data Protection Risk Code of Practice

[2]

[Anonymous], 2014, RIDING STARS PASSENG

[3]

Arrington M., 2006, TechCrunch

[4]

arx.deidentifier, ARX-data Anonymization Tool | A Comprehensive Software for Privacy-preserving Microdata Publishing

[5]

Attoh-Okine NO, 2017, WILEY SER OPERAT RES, P241

[6]

Becker R. K. B., Adult

[7] Open Data Hopes and Fears Determining the barriers of Open Data [J].

Beno, Martin ;

Figl, Kathrin ;

Umbrich, Juergen ;

Polleres, Axel .

2017 7TH INTERNATIONAL CONFERENCE FOR E-DEMOCRACY AND OPEN GOVERNMENT (CEDEM), 2017, :69-81

[8] An entropy based method for measuring anonymity [J].

Bezzi, Michele .

2007 THIRD INTERNATIONAL CONFERENCE ON SECURITY AND PRIVACY IN COMMUNICATION NETWORKS AND WORKSHOPS, 2007, :28-32

[9]

citizens-guide, Open Data and Privacy

[10]

cloud.google, Measuring Re-identification and Disclosure Risk Sensitive Data Protection Documentation Google Cloud

← 1 2 3 4 5 6 →