Scoring System for Quantifying the Privacy in Re-Identification of Tabular Datasets

被引:0
作者
Folz, Jakob [1 ,2 ]
Vidanalage, Manjitha D. [2 ]
Aufschlaeger, Robert [2 ]
Almaini, Amar [2 ]
Heigl, Michael [2 ]
Fiala, Dalibor [1 ]
Schramm, Martin [2 ]
机构
[1] Univ West Bohemia, Fac Appl Sci, Dept Comp Sci & Engn, Plzen 30100, Czech Republic
[2] Deggendorf Inst Technol, Inst ProtectIT, Fac Comp Sci, D-94469 Deggendorf, Germany
关键词
Data privacy; Measurement; Risk analysis; Open data; Protection; Security; Scalability; Public transportation; Object recognition; Mathematical models; Anonymization; privacy; re-identification risk; GDPR; uniqueness; uniformity; correlation; open data; ANONYMIZATION; TRANSPARENCY; CHALLENGES;
D O I
10.1109/ACCESS.2025.3563309
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This study introduces a System for Calculating Open Data Re-identification Risk (SCORR), a framework for quantifying privacy risks in tabular datasets. SCORR extends conventional metrics such as k-anonymity, l-diversity, and t-closeness with novel extended metrics, including uniqueness-only risk, uniformity-only risk, correlation-only risk, and Markov Model risk, to identify a broader range of re-identification threats. It efficiently analyses event-level and person-level datasets with categorical and numerical attributes. Experimental evaluations were conducted on three publicly available datasets: OULAD, HID, and Adult, across multiple anonymisation levels. The results indicate that higher anonymisation levels do not always proportionally enhance privacy. While stronger generalisation improves k-anonymity, l-diversity and t-closeness vary significantly across datasets. Uniqueness-only and uniformity-only risk decreased with anonymisation, whereas correlation-only risk remained high. Meanwhile, Markov Model risk consistently remained high, indicating little to no improvement regardless of the anonymisation level. Scalability analysis revealed that conventional metrics and Uniqueness-only risk incurred minimal computational overhead, remaining independent of dataset size. However, correlation-only and uniformity-only risk required significantly more processing time, while Markov Model risk incurred the highest computational cost. Despite this, all metrics remained unaffected by the number of quasi-identifiers, except t-closeness, which scaled linearly beyond a certain threshold. A usability evaluation comparing SCORR with the freely available ARX Tool showed that SCORR reduced the number of user interactions required for risk analysis by 59.38%, offering a more streamlined and efficient process. These results confirm SCORR's effectiveness in helping data custodians balance privacy protection and data utility, advancing privacy risk assessment beyond existing tools.
引用
收藏
页码:75727 / 75743
页数:17
相关论文
共 60 条
[1]  
Anonymisation, 2012, Managing Data Protection Risk Code of Practice
[2]  
[Anonymous], 2014, RIDING STARS PASSENG
[3]  
Arrington M., 2006, TechCrunch
[4]  
arx.deidentifier, ARX-data Anonymization Tool | A Comprehensive Software for Privacy-preserving Microdata Publishing
[5]  
Attoh-Okine NO, 2017, WILEY SER OPERAT RES, P241
[6]  
Becker R. K. B., Adult
[7]   Open Data Hopes and Fears Determining the barriers of Open Data [J].
Beno, Martin ;
Figl, Kathrin ;
Umbrich, Juergen ;
Polleres, Axel .
2017 7TH INTERNATIONAL CONFERENCE FOR E-DEMOCRACY AND OPEN GOVERNMENT (CEDEM), 2017, :69-81
[8]   An entropy based method for measuring anonymity [J].
Bezzi, Michele .
2007 THIRD INTERNATIONAL CONFERENCE ON SECURITY AND PRIVACY IN COMMUNICATION NETWORKS AND WORKSHOPS, 2007, :28-32
[9]  
citizens-guide, Open Data and Privacy
[10]  
cloud.google, Measuring Re-identification and Disclosure Risk Sensitive Data Protection Documentation Google Cloud