Spatio-textual user matching and clustering based on set similarity joins

被引:14
作者
Belesiotis, Alexandros [1 ]
Skoutas, Dimitrios [1 ]
Efstathiades, Christodoulos [2 ]
Kaffes, Vassilis [1 ]
Pfoser, Dieter [3 ]
机构
[1] RC Athena, IMIS, Athens, Greece
[2] European Univ Cyprus, Nicosia, Cyprus
[3] George Mason Univ, Fairfax, VA 22030 USA
基金
欧盟地平线“2020”;
关键词
Spatio-textual join; Set similarity join; Spatio-textual clustering; COMMUNITY STRUCTURE; SEARCH;
D O I
10.1007/s00778-018-0498-5
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
This paper addresses the problem of matching and clustering users based on their geolocated posts. Individual posts are matched according to spatial distance and textual similarity thresholds. Then, user similarity is defined as the ratio of their posts that match each other. Based on these criteria, we introduce efficient algorithms for identifying pairs of matching users in a large dataset, as well as for computing the top-k matching pairs. We then proceed to identify spatio-textual user clusters. For this purpose, we use the Louvain method for community detection. Our algorithms operate on a user graph where edge weights represent spatio-textual user similarities. Since the exact user similarity graph can be prohibitively expensive to compute, we exploit our previous algorithms to derive efficient methods that reduce execution time both by avoiding to compute exact similarity scores and by reducing the number of similarity calculations performed. The presented solution allows a trade-off between computation time and quality of detected clusters. The proposed algorithms are evaluated using three real-world datasets.
引用
收藏
页码:297 / 320
页数:24
相关论文
共 51 条
  • [1] Adelfio M. D., 2011, GIS, P489
  • [2] Adelfio Marco D., 2011, SIGSPATIAL, P132
  • [3] [Anonymous], 2007, WWW INT C WORLD WID, DOI [10.1145/1242572.1242591, DOI 10.1145/1242572.1242591]
  • [4] [Anonymous], 2012, SIGMOD Conference
  • [5] [Anonymous], 2007, P 16 INT C WORLD WID
  • [6] Aynaud T., 2013, Graph Partitioning, P315, DOI [10.1002/9781118601181.ch13, DOI 10.1002/9781118601181.CH13]
  • [7] Ballesteros J., 2011, Proc. of the 19th ACM SIGSPATIAL GIS Conf, P481
  • [8] Bichot C.-E., 2013, Graph Partitioning
  • [9] Fast unfolding of communities in large networks
    Blondel, Vincent D.
    Guillaume, Jean-Loup
    Lambiotte, Renaud
    Lefebvre, Etienne
    [J]. JOURNAL OF STATISTICAL MECHANICS-THEORY AND EXPERIMENT, 2008,
  • [10] Bouros P, 2012, PROC VLDB ENDOW, V6, P1