Distribution-Agnostic Database De-Anonymization Under Synchronization Errors

被引:2
作者
Bakirtas, Serhat [1 ]
Erkip, Elza [1 ]
机构
[1] NYU, Tandon Sch Engn, New York, NY 10003 USA
来源
2023 IEEE INTERNATIONAL WORKSHOP ON INFORMATION FORENSICS AND SECURITY, WIFS | 2023年
基金
美国国家科学基金会;
关键词
dataset; database; matching; de-anonymization; alignment; distribution-agnostic; privacy; synchronization; obfuscation;
D O I
10.1109/WIFS58808.2023.10374831
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There has recently been an increased scientific interest in the de-anonymization of users in anonymized databases containing user-level microdata via multifarious matching strategies utilizing publicly available correlated data. Existing literature has either emphasized practical aspects where underlying data distribution is not required, with limited or no theoretical guarantees, or theoretical aspects with the assumption of complete availability of underlying distributions. In this work, we take a step towards reconciling these two lines of work by providing theoretical guarantees for the de-anonymization of random correlated databases without prior knowledge of data distribution. Motivated by time-indexed microdata, we consider database de-anonymization under both synchronization errors (column repetitions) and obfuscation (noise). By modifying the previously used replica detection algorithm to accommodate for the unknown underlying distribution, proposing a new seeded deletion detection algorithm, and employing statistical and information-theoretic tools, we derive sufficient conditions on the database growth rate for successful matching. Our findings demonstrate that a double-logarithmic seed size relative to row size ensures successful deletion detection. More importantly, we show that the derived sufficient conditions are the same as in the distribution-aware setting, negating any asymptotic loss of performance due to unknown underlying distributions.
引用
收藏
页数:6
相关论文
共 24 条
[1]  
Bakirtas S., 2022, 2022 56 AS C SIGN SY
[2]   Database Matching Under Column Deletions [J].
Bakirtas, Serhal ;
Erkip, Elza .
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, :2720-2725
[3]  
Bakirtas S, 2023, Arxiv, DOI arXiv:2309.14484
[4]  
Bakirtas S, 2023, Arxiv, DOI arXiv:2301.06796
[5]   Database Matching Under Adversarial Column Deletions [J].
Bakirtas, Serhat ;
Erkip, Elza .
2023 IEEE INFORMATION THEORY WORKSHOP, ITW, 2023, :181-185
[6]   Seeded Database Matching Under Noisy Column Repetitions [J].
Bakirtas, Serhat ;
Erkip, Elza .
2022 IEEE INFORMATION THEORY WORKSHOP (ITW), 2022, :386-391
[7]   MOMENT ESTIMATORS FOR PARAMETERS OF A MIXTURE OF 2 BINOMIAL DISTRIBUTIONS [J].
BLISCHKE, WR .
ANNALS OF MATHEMATICAL STATISTICS, 1962, 33 (02) :444-&
[8]  
Cormen T.H., 2022, Introduction to Algorithms, Vfourth
[9]  
Cover T. A., 2006, Elements of information theory, V2nd
[10]  
Cullina D, 2018, IEEE INT SYMP INFO, P651, DOI 10.1109/ISIT.2018.8437908