Re-identification and information fusion between anonymized CDR and social network data

被引:32
作者
Cecaj, Alket [1 ]
Mamei, Marco [1 ]
Zambonelli, Franco [1 ]
机构
[1] Univ Modena & Reggio Emilia, Reggio Emilia, Italy
关键词
Mobility patterns; De-anonymization; Information fusion;
D O I
10.1007/s12652-015-0303-x
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The analysis of multiple datasets on users' behaviors opens interesting information fusion possibilities and, at the same time, creates a potential for re-identification and de-anonymization of users' data. On the one hand, this kind of approaches can breach users' privacy despite anonymization. On the other hand, combining different datasets is a key enabler for advanced context-awareness in that information from multiple sources can complement and enrich each other. In this work we analyze different anonymized mobility datasets in the direction of highlighting re-identification and information fusion possibilities. In particular we focus on call detail record (CDR) datasets released by mobile telecom operators and datasets comprising geo-localized messages released by social network sites. Results shows that: (1) in line with previous findings, few (about 4) data points are enough to uniquely pin point the majority (90 %) of the users, (2) more than 20 % of CDR users have a single social network user exhibiting a number of matching data points. We speculate that these two users might be the same person. (3) We derive an estimate of the probability of two users begin the same person given the number of data points they have in common, and estimate that for 3 % of the social network users we can find a CDR user very likely (>90 % probability) to be the same person.
引用
收藏
页码:83 / 96
页数:14
相关论文
共 22 条
[1]  
Abraham R., 2006, INT C INF COMM TECHN
[2]   Anonymization of moving objects databases by clustering and perturbation [J].
Abul, Osman ;
Bonchi, Francesco ;
Nanni, Mirco .
INFORMATION SYSTEMS, 2010, 35 (08) :884-910
[3]  
[Anonymous], 2011, Mining of Massive Datasets
[4]  
[Anonymous], 2013, 10211 HARV U DAT PRI
[5]  
[Anonymous], 2014, ACM COSN
[6]  
Blondel V.D., 2013, Data for development: The D4D challenge on mobile phone data
[7]  
Brickell J., 2008, INT C KNOWL DISC DAT
[8]   Inferring social ties from geographic coincidences [J].
Crandall, David J. ;
Backstrom, Lars ;
Cosley, Dan ;
Suri, Siddharth ;
Huttenlocher, Daniel ;
Kleinberg, Jon .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (52) :22436-22441
[9]  
Danezis G, 2013, DEANONYMIZING D4D DA
[10]  
Do T. M. T, 2011, P 13 INT C MULT INT, P353, DOI [DOI 10.1145/2070481.2070550, 10.1145/2070481.2070550]