De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice

被引:23
作者
Wang, Huandong [1 ]
Gao, Chen [1 ]
Li, Yong [1 ]
Wang, Gang [2 ]
Jin, Depeng [1 ]
Sun, Jingbo [3 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Virginia Tech, Dept Comp Sci, Blacksburg, VA USA
[3] China Telecom Beijing Res Inst, Beijing, Peoples R China
来源
25TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2018) | 2018年
关键词
ANONYMITY;
D O I
10.14722/ndss.2018.23211
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have under-estimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.
引用
收藏
页数:15
相关论文
共 42 条
[1]   Anonymization of moving objects databases by clustering and perturbation [J].
Abul, Osman ;
Bonchi, Francesco ;
Nanni, Mirco .
INFORMATION SYSTEMS, 2010, 35 (08) :884-910
[2]  
Abul O, 2008, PROC INT CONF DATA, P376, DOI 10.1109/ICDE.2008.4497446
[3]   A Case Study: Privacy Preserving Release of Spatio-temporal Density in Paris [J].
Acs, Gergely ;
Castelluccia, Claude .
PROCEEDINGS OF THE 20TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING (KDD'14), 2014, :1679-1688
[4]  
Akim E., 2003, P ISSFD
[5]  
Andres M. E., 2013, P ACM SIGSAC C COMP, P901
[6]  
[Anonymous], 2013, P 22 INT C WORLD WID, DOI DOI 10.1145/2488388.2488428
[7]  
[Anonymous], 2008, Introduction to information retrieval
[8]  
[Anonymous], 2016, SIGKDD Explorations, DOI 10.1145/3068777.3068781
[9]  
Banerjee N, 2007, LECT NOTES COMPUT SC, V4717, P217
[10]  
Bishop C.M., 2006, Machine Learning, V128