De-anonymization of Mobility Trajectories: Dissecting the Gaps between Theory and Practice

被引:23
作者
Wang, Huandong [1 ]
Gao, Chen [1 ]
Li, Yong [1 ]
Wang, Gang [2 ]
Jin, Depeng [1 ]
Sun, Jingbo [3 ]
机构
[1] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China
[2] Virginia Tech, Dept Comp Sci, Blacksburg, VA USA
[3] China Telecom Beijing Res Inst, Beijing, Peoples R China
来源
25TH ANNUAL NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM (NDSS 2018) | 2018年
关键词
ANONYMITY;
D O I
10.14722/ndss.2018.23211
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Human mobility trajectories are increasingly collected by ISPs to assist academic research and commercial applications. Meanwhile, there is a growing concern that individual trajectories can be de-anonymized when the data is shared, using information from external sources (e.g. online social networks). To understand this risk, prior works either estimate the theoretical privacy bound or simulate de-anonymization attacks on synthetically created (small) datasets. However, it is not clear how well the theoretical estimations are preserved in practice. In this paper, we collected a large-scale ground-truth trajectory dataset from 2,161,500 users of a cellular network, and two matched external trajectory datasets from a large social network (56,683 users) and a check-in/review service (45,790 users) on the same user population. The two sets of large ground-truth data provide a rare opportunity to extensively evaluate a variety of de-anonymization algorithms (7 in total). We find that their performance in the real-world dataset is far from the theoretical bound. Further analysis shows that most algorithms have under-estimated the impact of spatio-temporal mismatches between the data from different sources, and the high sparsity of user generated data also contributes to the underperformance. Based on these insights, we propose 4 new algorithms that are specially designed to tolerate spatial or temporal mismatches (or both) and model user behavior. Extensive evaluations show that our algorithms achieve more than 17% performance gain over the best existing algorithms, confirming our insights.
引用
收藏
页数:15
相关论文
共 42 条
[31]   De-anonymizing Social Networks [J].
Narayanan, Arvind ;
Shmatikov, Vitaly .
PROCEEDINGS OF THE 2009 30TH IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2009, :173-187
[32]   Back to the Drawing Board: Revisiting the Design of Optimal Location Privacy-preserving Mechanisms [J].
Oya, Simon ;
Troncoso, Carmela ;
Perez-Gonzalez, Fernando .
CCS'17: PROCEEDINGS OF THE 2017 ACM SIGSAC CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2017, :1959-1972
[33]   Linking Users Across Domains with Location Data: Theory and Validation [J].
Riederer, Chris ;
Kim, Yunsung ;
Chaintreau, Augustin ;
Korula, Nitish ;
Lattanzi, Silvio .
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16), 2016, :707-719
[34]   Quantifying Location Privacy [J].
Shokri, Reza ;
Theodorakopoulos, George ;
Le Boudec, Jean-Yves ;
Hubaux, Jean-Pierre .
2011 IEEE SYMPOSIUM ON SECURITY AND PRIVACY (SP 2011), 2011, :247-262
[35]  
Srivatsa Mudhakar., 2012, ACM C COMP COMM SEC, P628, DOI [https://doi.org/10.1145/2382196.2382262, DOI 10.1145/2382196.2382262]
[36]   k-anonymity:: A model for protecting privacy [J].
Sweeney, L .
INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS, 2002, 10 (05) :557-570
[37]  
Thomas M., 2006, Elements of Information Theory
[38]  
Valcarce D., 2016, P CERI
[39]  
Wang Gang., 2016, P 10 INT AAAI C WEB, P417
[40]  
Zang H., 2011, P 17 ANN INT C MOB C, P145, DOI [DOI 10.1145/2030613.2030630, 10.1145/2030613.2030630]