Linking Users Across Domains with Location Data: Theory and Validation

被引:124
作者
Riederer, Chris [1 ]
Kim, Yunsung [1 ]
Chaintreau, Augustin [1 ]
Korula, Nitish [2 ]
Lattanzi, Silvio [2 ]
机构
[1] Columbia Univ, New York, NY 10027 USA
[2] Google Res, New York, NY USA
来源
PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON WORLD WIDE WEB (WWW'16) | 2016年
基金
美国国家科学基金会;
关键词
D O I
10.1145/2872427.2883002
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Linking accounts of the same user across datasets even when personally identifying information is removed or unavailable is an important open problem studied in many contexts. Beyond many practical applications, (such as cross domain analysis, recommendation, and link prediction), understanding this problem more generally informs us on the privacy implications of data disclosure. Previous work has typically addressed this question using either different portions of the same dataset or observing the same behavior across thematically similar domains. In contrast, the general cross-domain case where users have different profiles independently generated from a common but unknown pattern raises new challenges, including difficulties in validation, and remains under-explored. In this paper, we address the reconciliation problem for location-based datasets and introduce a robust method for this general setting. Location datasets are a particularly fruitful domain to study: such records are frequently produced by users in an increasing number of applications and are highly sensitive, especially when linked to other data sets. Our main contribution is a generic and self-tunable algorithm that leverages any pair of sporadic location-based datasets to determine the most likely matching between the users it contains. While making very general assumptions on the patterns of mobile users, we show that the maximum weight matching we compute is provably correct. Although true cross-domain datasets are a rarity, our experimental evaluation uses two entirely new data collections, including one we crawled, on an unprecedented scale. The method we design outperforms naive rules and prior heuristics. As it combines both sparse and dense properties of location-based data and accounts for probabilistic dynamics of observation, it can be shown to be robust even when data gets sparse.
引用
收藏
页码:707 / 719
页数:13
相关论文
共 27 条
  • [1] [Anonymous], 2013, P 22 INT C WORLD WID, DOI DOI 10.1145/2488388.2488428
  • [2] [Anonymous], 2012, DATA MATCHING CONCEP, DOI DOI 10.1007/978-3-642-31164-2
  • [3] Algorithms for Large, Sparse Network Alignment Problems
    Bayati, Mohsen
    Gerritsen, Margot
    Gleich, David F.
    Saberi, Amin
    Wang, Ying
    [J]. 2009 9TH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, 2009, : 705 - +
  • [4] Cecaj A, 2014, INT CONF PERVAS COMP, P237, DOI 10.1109/PerComW.2014.6815210
  • [5] Cecaj Alket., 2015, Journal of Ambient Intelligence and Humanized Computing, V7, P1
  • [6] Cho E., 2011, KD11 17 ACM SIGKDD I, P1082
  • [7] Inferring social ties from geographic coincidences
    Crandall, David J.
    Backstrom, Lars
    Cosley, Dan
    Suri, Siddharth
    Huttenlocher, Daniel
    Kleinberg, Jon
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2010, 107 (52) : 22436 - 22441
  • [8] Unique in the shopping mall: On the reidentifiability of credit card metadata
    de Montjoye, Yves-Alexandre
    Radaelli, Laura
    Singh, Vivek Kumar
    Pentland, Alex Sandy
    [J]. SCIENCE, 2015, 347 (6221) : 536 - 539
  • [9] Unique in the Crowd: The privacy bounds of human mobility
    de Montjoye, Yves-Alexandre
    Hidalgo, Cesar A.
    Verleysen, Michel
    Blondel, Vincent D.
    [J]. SCIENTIFIC REPORTS, 2013, 3
  • [10] On the Reliability of Profile Matching Across Large Online Social Networks
    Goga, Oana
    Loiseau, Patrick
    Sommer, Robin
    Teixeira, Renata
    Gummadi, Krishna P.
    [J]. KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 1799 - 1808